Code of the Day
AdvancedSklearn in Practice

Linear regression in sklearn

Fit LinearRegression, inspect coefficients, compute R², and read residuals — the full diagnostic workflow.

Data ScienceAdvanced10 min read
By the end of this lesson you will be able to:
  • Fit LinearRegression on a dataset and inspect coef_ and intercept_
  • Compute R² and interpret what it measures
  • Print residuals and explain what a systematic pattern in them means

Linear regression is rarely the final model in a production system, but it is almost always the first one — fast to fit, easy to inspect, and a reliable baseline against which more complex models must justify their extra complexity.

Python — editable, runs in your browser

What the numbers tell you

Coefficients (coef_) are the partial slopes: how much the prediction changes when that feature increases by 1 unit, holding all others fixed. Here the true slopes are 3.0 and 1.5 — the fitted values should be close. The distance between true and fitted depends on noise and sample size.

Intercept (intercept_) is the prediction when all features are zero. In this synthetic example, the true intercept is zero, so the fitted value should be near zero.

(coefficient of determination) measures the fraction of variance in the target that the model explains. An R² of 1.0 is a perfect fit; 0.0 means the model does no better than predicting the mean of y for every sample; negative R² means the model is actively worse than the mean predictor.

Residuals are the errors on individual predictions (y_true - y_pred). Their mean should be near zero — a systematic non-zero mean is a sign of miscalibration. Their distribution tells you more: if residuals correlate with a feature, that feature has a non-linear relationship with the target that a linear model cannot capture.

R² looks impressive on training data but can be misleading if you never check residuals. A model can achieve high R² while still making large errors on specific subgroups. Always look at the residual distribution and plot predicted vs actual when the stakes matter.

Where to go next

Next: decision trees — a completely different model family that splits the feature space recursively, requires no scaling, and exposes its logic visually.

Finished reading? Mark it complete to track your progress.

On this page