Code of the Day
IntermediateReshaping and Merging

Custom aggregations

Apply lambdas, group-normalise with transform, and chain multiple aggregation functions — all in runnable code.

Data ScienceIntermediate10 min read
By the end of this lesson you will be able to:
  • Apply a lambda to a column with .apply()
  • Use .transform() inside a groupby to add a group-normalised column
  • Chain .groupby().agg() with multiple aggregation functions per column

The previous lesson drew the conceptual map between apply, transform, agg, and map. This lesson runs all of them. Each block below is self-contained — run them in order to see how the output shape changes with each operation.

apply with a lambda

apply on a Series passes each value to the function. The lambda below bins a numeric score into a letter grade:

Python — editable, runs in your browser

The output column has the same length as the input — one grade per student.

transform for group-normalisation

transform returns a Series aligned to the original index, making it ideal for adding a column that needs group context. Here we compute each student's share of their department's total score:

Python — editable, runs in your browser

Notice dept_total repeats the group sum on every row within the group. That alignment is what transform guarantees — it would be much harder to achieve with agg alone.

groupby with multiple aggregations

agg accepts a dict mapping column names to lists of functions. This lets you produce several summary statistics in a single call:

Python — editable, runs in your browser

The named-aggregation syntax (col=("source_col", "func")) was introduced in pandas 0.25. It produces clean, flat column names in the result — avoiding the multi-level column index that the older dict-of-lists syntax created.

When you need a custom summary statistic that isn't built in, pass a lambda to agg directly:

df.groupby("dept")["score"].agg(lambda x: x.max() - x.min())

This computes the range (max minus min) per group — a one-liner that would otherwise require two separate agg calls.

Where to go next

Next: lab — merge and reshape — a guided end-to-end exercise that merges two datasets, reshapes, computes a rolling summary, and adds a derived column using everything from this module.

Finished reading? Mark it complete to track your progress.

On this page