Regression metrics
Compute MAE, MSE, RMSE, and R² with sklearn, and understand when each gives a more honest picture of prediction error.
- Compute MAE, MSE, RMSE, and R² with sklearn metrics functions
- Explain what each metric measures and its sensitivity to outliers
- Choose between RMSE and MAE for a given error cost structure
Regression evaluation cannot use accuracy — there are no categories, only magnitudes of error. The four standard metrics each summarise those magnitudes differently, and the differences matter when your errors are not uniformly distributed.
What each metric measures
MAE (Mean Absolute Error) is the average of |y_true - y_pred|. It treats
all errors proportionally — a prediction that is 10 units off counts exactly
ten times more than one that is 1 unit off. MAE is robust to outliers because
large errors do not dominate the average.
MSE (Mean Squared Error) is the average of (y_true - y_pred)². Squaring
amplifies large errors: an error of 10 contributes 100 to MSE, but an error of
1 contributes only 1. MSE is differentiable and useful for gradient-based
optimisation, but hard to interpret because its units are squared.
RMSE (Root MSE) restores the original units by taking the square root. It shares MSE's sensitivity to outliers but is interpretable as "typical error in the same units as y". RMSE is the most commonly reported single-number metric for regression.
R² measures the fraction of variance in the target explained by the model. It is scale-invariant: a value of 0.85 means the model explains 85% of the variance, regardless of whether the target is in dollars or millimetres. Negative R² means the model is worse than predicting the constant mean.
RMSE vs MAE
Run the code and observe the contrast. RMSE drops substantially when the two outliers are removed; MAE drops less. That asymmetry illustrates the key rule: if large errors are disproportionately costly (missing a product demand spike causes an expensive stock-out), use RMSE. If all errors are equally bad per unit (shipping delay predictions where every minute matters equally), use MAE.
Comparing RMSE across datasets is not meaningful unless both datasets have the same scale. Compare RMSE to the standard deviation of the target: if RMSE ≈ std(y), the model barely beats the mean predictor. R² provides a normalised version of this comparison automatically.
Where to go next
Next: cross-validation — why a single train/test split gives a noisy estimate and how k-fold CV gives a more stable one.
Classification metrics
Why accuracy alone misleads on imbalanced data — and how precision, recall, F1, and the confusion matrix give a complete picture.
Cross-validation
Why a single train/test split is a noisy estimate — and how k-fold CV uses all the data for evaluation without peeking at the test set.