Custom aggregations
Apply lambdas, group-normalise with transform, and chain multiple aggregation functions — all in runnable code.
- Apply a lambda to a column with .apply()
- Use .transform() inside a groupby to add a group-normalised column
- Chain .groupby().agg() with multiple aggregation functions per column
The previous lesson drew the conceptual map between apply, transform, agg,
and map. This lesson runs all of them. Each block below is self-contained —
run them in order to see how the output shape changes with each operation.
apply with a lambda
apply on a Series passes each value to the function. The lambda below bins a
numeric score into a letter grade:
The output column has the same length as the input — one grade per student.
transform for group-normalisation
transform returns a Series aligned to the original index, making it ideal for
adding a column that needs group context. Here we compute each student's share of
their department's total score:
Notice dept_total repeats the group sum on every row within the group. That
alignment is what transform guarantees — it would be much harder to achieve
with agg alone.
groupby with multiple aggregations
agg accepts a dict mapping column names to lists of functions. This lets you
produce several summary statistics in a single call:
The named-aggregation syntax (col=("source_col", "func")) was introduced in
pandas 0.25. It produces clean, flat column names in the result — avoiding the
multi-level column index that the older dict-of-lists syntax created.
When you need a custom summary statistic that isn't built in, pass a lambda to
agg directly:
df.groupby("dept")["score"].agg(lambda x: x.max() - x.min())This computes the range (max minus min) per group — a one-liner that would
otherwise require two separate agg calls.
Where to go next
Next: lab — merge and reshape — a guided end-to-end exercise that merges two datasets, reshapes, computes a rolling summary, and adds a derived column using everything from this module.