Regression metrics

Compute MAE, MSE, RMSE, and R² with sklearn, and understand when each gives a more honest picture of prediction error.

Regression evaluation cannot use accuracy — there are no categories, only magnitudes of error. The four standard metrics each summarise those magnitudes differently, and the differences matter when your errors are not uniformly distributed.

Python — editable, runs in your browser

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import (
  mean_absolute_error,
  mean_squared_error,
  r2_score,
)

rng = np.random.default_rng(5)
n = 200
X = rng.normal(0, 1, (n, 3))
y = 2 * X[:, 0] - X[:, 1] + 0.5 * X[:, 2] + rng.normal(0, 1.2, n)

# Introduce two large outliers to make the MAE vs RMSE contrast visible
y[10]  += 15
y[100] += 18

X_train, X_test, y_train, y_test = train_test_split(
  X, y, test_size=0.25, random_state=5
)

model = LinearRegression()
model.fit(X_train, y_train)
preds = model.predict(X_test)

mae  = mean_absolute_error(y_test, preds)
mse  = mean_squared_error(y_test, preds)
rmse = np.sqrt(mse)
r2   = r2_score(y_test, preds)

print(f"MAE:  {mae:.4f}")
print(f"MSE:  {mse:.4f}")
print(f"RMSE: {rmse:.4f}")
print(f"R²:   {r2:.4f}")

# For contrast, show same metrics without the two outliers
mask = np.ones(len(y_test), dtype=bool)
residuals = y_test - preds
# Mark the two largest absolute residuals as outliers
outlier_idx = np.argsort(np.abs(residuals))[-2:]
mask[outlier_idx] = False

mae_no_out  = mean_absolute_error(y_test[mask], preds[mask])
rmse_no_out = np.sqrt(mean_squared_error(y_test[mask], preds[mask]))
print(f"\nWith outliers removed:")
print(f"MAE:  {mae_no_out:.4f}  (change: {mae_no_out - mae:+.4f})")
print(f"RMSE: {rmse_no_out:.4f}  (change: {rmse_no_out - rmse:+.4f})")

What each metric measures

MAE (Mean Absolute Error) is the average of |y_true - y_pred|. It treats all errors proportionally — a prediction that is 10 units off counts exactly ten times more than one that is 1 unit off. MAE is robust to outliers because large errors do not dominate the average.

MSE (Mean Squared Error) is the average of (y_true - y_pred)². Squaring amplifies large errors: an error of 10 contributes 100 to MSE, but an error of 1 contributes only 1. MSE is differentiable and useful for gradient-based optimisation, but hard to interpret because its units are squared.

RMSE (Root MSE) restores the original units by taking the square root. It shares MSE's sensitivity to outliers but is interpretable as "typical error in the same units as y". RMSE is the most commonly reported single-number metric for regression.

R² measures the fraction of variance in the target explained by the model. It is scale-invariant: a value of 0.85 means the model explains 85% of the variance, regardless of whether the target is in dollars or millimetres. Negative R² means the model is worse than predicting the constant mean.

RMSE vs MAE

Run the code and observe the contrast. RMSE drops substantially when the two outliers are removed; MAE drops less. That asymmetry illustrates the key rule: if large errors are disproportionately costly (missing a product demand spike causes an expensive stock-out), use RMSE. If all errors are equally bad per unit (shipping delay predictions where every minute matters equally), use MAE.

Comparing RMSE across datasets is not meaningful unless both datasets have the same scale. Compare RMSE to the standard deviation of the target: if RMSE ≈ std(y), the model barely beats the mean predictor. R² provides a normalised version of this comparison automatically.

Where to go next

Next: cross-validation — why a single train/test split gives a noisy estimate and how k-fold CV gives a more stable one.

Finished reading? Mark it complete to track your progress.

What each metric measures

RMSE vs MAE

Where to go next

On this page