Apply and transform
Distinguish apply, transform, agg, and map — four pandas operations that look similar but serve different purposes.
- Explain what each of apply, transform, agg, and map does to a Series or DataFrame
- Choose the right operation given a desired input and output shape
- Recognise when apply is the appropriate escape hatch for custom logic
Pandas gives you four ways to apply a function to your data: apply,
transform, agg, and map. They look interchangeable at first glance — each
takes a function and returns something. The difference is in what shape comes
back, and that shape determines which operation you need.
The four operations
map — element-wise on a Series
Series.map(fn) applies fn to each individual value and returns a Series of
the same length. It is for replacing or converting individual values: turning a
string column into a numeric code, or looking up a value in a dictionary.
df["size_code"] = df["size"].map({"small": 1, "medium": 2, "large": 3})The output has the same shape as the input. map on a DataFrame column never
collapses rows or changes length.
agg — reduces a group to a summary
GroupBy.agg(fn) (or Series.agg) reduces many values to fewer. Calling
.groupby("category").agg("mean") turns every group into a single row. The
result is shorter than the input. Use agg when you want summary statistics per
group.
transform — group-aware, same shape back
GroupBy.transform(fn) applies a function per group but returns a result that is
aligned back to the original index. The output has the same length as the
original DataFrame. This is the operation for adding a derived column that needs
group context — for example, the group mean or the within-group rank — while
keeping every original row.
df["group_mean"] = df.groupby("category")["value"].transform("mean")Each row gets the mean of its own group, not the overall mean.
apply — the flexible escape hatch
DataFrame.apply(fn, axis=0|1) applies fn to each column (axis=0) or each
row (axis=1). It can return a scalar (reducing), a Series of the same length
(mapping), or even a new DataFrame. Because it is so flexible, it is also the
slowest. Use apply when none of the more specific operations can express what
you need — complex multi-column logic, for example.
A decision guide
| Question | Operation |
|---|---|
| Replace each value individually? | map |
| Summarise each group to one number? | agg |
| Add a group-context column, keep all rows? | transform |
| Complex row- or column-wise custom logic? | apply |
When you find yourself writing apply frequently, it is worth pausing to check
whether a vectorised pandas operation exists. Vectorised operations (arithmetic,
str methods, clip, where) are often 10–100x faster than apply because
they skip Python's per-row function-call overhead.
The shape rule
The cleanest way to remember the difference: think about the shape of what comes back relative to what went in.
map→ same length, element-by-elementtransform→ same length, group-awareagg→ shorter, one row per groupapply→ any shape, depends on what your function returns
If you need the original rows to be intact with a new column attached, transform
is almost always the right choice. If you need a summary table, agg is. If you
need to convert individual values, map is. Everything else is apply.
Where to go next
Next: custom aggregations — seeing all four operations in runnable code,
including transform for group-normalisation and multi-function agg.