Lab: compare classification models
Fit three classifiers on the same dataset, measure accuracy and confusion matrices, and identify which generalises best.
- Fit LogisticRegression, DecisionTreeClassifier, and KNeighborsClassifier on the same dataset
- Score each model on a held-out test set
- Build and read confusion matrices
- Write a one-paragraph interpretation of which model generalises best and why
The model-selection heuristics from the previous lesson become concrete when you run all three on the same classification problem. The goal here is not just to get numbers — it is to develop the habit of asking why the numbers differ and what that tells you about the data and the models.
The dataset
We use a synthetic binary classification problem: 300 samples, 2 informative features, modest class overlap. It is small enough that all three models run instantly, but large enough for the train/test split to give stable estimates.
Accuracy is a starting point, but it does not tell you where each model makes mistakes. A model might be correct 85% of the time but wrong on exactly the cases that matter most. The confusion matrix shows the breakdown.
Checkpoint 2 — confusion matrices
Reading the confusion matrix: each row is the true class; each column is the predicted class. The diagonal is correct predictions (true negatives and true positives). Off-diagonal entries are errors — false positives (predicted positive when actually negative) and false negatives (missed positives).
Two models with identical accuracy can have very different confusion matrices. One might favour false positives; another might favour false negatives. Which is worse depends entirely on the problem domain — a medical test prefers false positives over missed diagnoses.
Checkpoint 3 — full metric sweep
Interpretation exercise
After running the three blocks above, write your interpretation (mentally, or in a notebook). A complete interpretation addresses:
- Which model has the highest accuracy, and is the gap meaningful?
- Do precision and recall differ substantially across models? What would that mean for a use case where false negatives are costly?
- Does the confusion matrix pattern match what the accuracy numbers suggest?
- Based on the bias-variance framework: is any model likely overfitting at these settings? How would you test that hypothesis?
For this synthetic dataset, logistic regression often performs comparably to
the tree and k-NN because the decision boundary is close to linear. If you
increase n_informative or add polynomial structure, the gap will widen.
Changing the data and re-running is a fast way to build intuition about when
each model earns its keep.
Where to go next
The ML Concepts module is complete. Next: Sklearn in Practice — the uniform fit/predict/transform API that makes all of these models composable into pipelines.