fit(), predict(), transform()

The three-method contract that makes every sklearn estimator composable — and why that uniformity matters at scale.

Most ML libraries expose dozens of models, each with its own interface. Sklearn made a different bet: every object, regardless of what it does, speaks the same three-method language. That constraint is not a limitation — it is what makes composability possible.

The contract

Every sklearn object has a fit() method. What fit() does depends on the object type, but the signature is always the same: fit(X, y=None). That y is optional because transformers do not need a target.

Estimators — classifiers and regressors — learn to map inputs to outputs. After fit(X_train, y_train), they expose predict(X), which applies the learned mapping to new data. The model artefact lives in attributes ending with _ (a sklearn convention): coef_, feature_importances_, n_iter_.

Transformers — scalers, encoders, imputers — learn statistics about the data and apply a transformation. After fit(X_train), they expose transform(X), which applies the learned statistics. A StandardScaler learns mean and standard deviation during fit, then uses those to standardise during transform. The key rule: always fit on training data, then transform both training and test data using those same statistics. Fitting again on test data leaks information.

fit_transform(X) is a convenience method that calls fit(X) then transform(X) in one pass. It is correct to use on training data; it must not be used on test data (it would recompute the statistics from the test set, breaking the train/test separation).

Why uniformity enables pipelines

Because every step speaks the same language, sklearn can chain them. A Pipeline calls fit() on each step in sequence, passing the output of one step as the input to the next. During predict(), it calls transform() on each intermediate step and predict() on the final estimator. You do not have to manage that orchestration — the contract handles it.

This becomes critical at deployment. A pipeline bundles preprocessing and model together. Serialise the pipeline once with joblib.dump, load it anywhere, and call predict() — the same scaling and encoding that ran during training runs automatically at inference.

The _ suffix on fitted attributes (like scaler.mean_, tree.feature_importances_) is a sklearn convention signalling "this was computed during fit". If you see an attribute without the suffix, it was passed in as a constructor argument (a hyperparameter). This distinction matters when debugging: constructor arguments are your choices; underscore attributes are what the data produced.

Where to go next

With the API contract understood, the next lesson applies it concretely: linear regression — fitting a model, inspecting its coefficients, and measuring how well it explains variance in the target.

Finished reading? Mark it complete to track your progress.

fit(), predict(), transform()

The contract

Why uniformity enables pipelines

Where to go next

On this page