Scaling in practice
Apply MinMaxScaler and StandardScaler from sklearn — fit on training data, transform both splits, and verify the before/after statistics.
- Apply MinMaxScaler to produce [0, 1]-scaled features
- Apply StandardScaler to produce mean-0, std-1 features
- Fit a scaler on the training set and transform both train and test correctly
Scikit-learn scalers follow a consistent interface: .fit() computes the
statistics from training data, and .transform() applies the scaling. You can
chain them with .fit_transform() on the training set, but never on the test
set — fit on train, then transform only on test.
MinMaxScaler
After scaling, both age and income have minimum 0 and maximum 1 on the
training set. The test set is transformed using the same min/max values learned
from training — the test set min and max may not reach exactly 0 or 1.
StandardScaler
Scikit-learn scalers work on NumPy arrays. fit_transform() returns an array,
not a DataFrame. If you need column names for downstream steps, wrap the result:
pd.DataFrame(X_train_std, columns=X_train.columns).
Comparing the two
After fitting on the same training data:
| Statistic | MinMaxScaler | StandardScaler |
|---|---|---|
| Range | [0, 1] on train | Unbounded |
| Mean | Not necessarily 0 | ~0 |
| Std | Not necessarily 1 | ~1 |
| Sensitive to outliers | Yes | Less so |
Choose MinMaxScaler when you know the feature has a bounded range and no
extreme outliers. Choose StandardScaler when the distribution is roughly
Gaussian or when outliers are present.
Where to go next
Next: lab — prepare a dataset — an end-to-end pipeline taking raw mixed-type data through cleaning, encoding, splitting, and scaling to produce a train/test pair ready for a model.
Scaling and normalisation
Min-max scaling and z-score standardisation — what each does, which models need scaling, and why the scaler must be fit on training data only.
Lab: prepare a dataset
Take a raw mixed-type dataset end-to-end through cleaning, encoding, train/test splitting, and scaling — producing a model-ready train/test pair.