Code of the Day
IntermediateReshaping and Merging

Reshaping data

Melt converts wide data to long, pivot converts long to wide — understanding which shape your analysis needs saves a lot of friction.

Data ScienceIntermediate6 min read
Recommended first
By the end of this lesson you will be able to:
  • Explain the difference between wide and long data formats with a concrete example
  • Describe when melt (wide to long) and pivot (long to wide) are the right choice
  • Identify which shape a given analysis requires before writing any code

The same data can be stored in radically different layouts. Choosing the wrong layout before analysis does not make results wrong — it makes them hard. The two layouts you will encounter constantly are wide and long (sometimes called tidy).

A concrete example

Imagine temperature readings for three cities across four months. In wide format, each city is its own column:

monthLondonParisBerlin
Jan462
Feb583
Mar9138
Apr121712

In long format, there is one row per observation:

monthcitytemp
JanLondon4
JanParis6
JanBerlin2
FebLondon5

The wide table has 4 rows and 4 columns (12 data values). The long table has 12 rows and 3 columns — the same 12 values. Neither representation is more "correct"; each is better suited to different tasks.

When long form is better

Long form is the natural input for most pandas operations. groupby("city") is trivial in long form — just group on the city column. In wide form, you would have to loop over column names, which is fragile and verbose.

Long form is also what most plotting libraries expect. To draw a line per city, you need city as a value in a column — not spread across column headers.

Any tidy-data operation that works on a single column of values (aggregations, filters, plots) is easier when your variable of interest is a column, not a row.

When wide form is better

Wide form has its moments too. Computing a correlation matrix across cities requires them to be columns — df[["London","Paris","Berlin"]].corr() works directly in wide form, while long form requires a pivot first.

Wide form also reads more naturally as a display table: a human scanning the temperature grid above reads it faster in wide form. If the final output is a table for a report, wide form is usually cleaner.

The general rule: use long form for computation, wide form for display. Start in long, aggregate and analyse, then pivot to wide only at the point where you need a human-readable summary.

melt and pivot as inverses

Melt is the operation that goes wide to long: it takes a set of columns and "melts" them into two columns — one holding the old column names, one holding the values. This is the direction you use most when incoming data is wide (a common format for spreadsheet exports).

Pivot goes the other direction, long to wide: it takes values from one column and spreads them out into new columns, one per unique value. pivot_table is the more flexible variant that also applies an aggregation, so you don't need to pre-deduplicate.

In the next lesson you will run both operations on the temperature dataset above and see exactly what each parameter controls.

Where to go next

Next: window operations — running .melt(), .pivot_table(), .rolling(), and .shift() in code, and building lag features for time-series analysis.

Finished reading? Mark it complete to track your progress.

On this page