Lab: merge and reshape
Guided end-to-end exercise — merge two datasets, reshape to long form, compute a rolling average, and flag above-average periods.
- Merge two DataFrames on a shared key and verify the result shape
- Melt the merged table to long form
- Compute a 3-period rolling average per product
- Use apply to add a boolean flag column based on a computed threshold
This is an optional lab. No new concepts — just practice applying everything from the Reshaping and Merging module to a dataset you have not seen before. Work through each step, run the code, and check that the output shapes match your expectations before moving on.
You have two tables: a products catalogue and a sales log. Your goal is to merge them, reshape the result for analysis, compute a rolling trend, and annotate each row with whether it was an above-average period.
Step 1 — inspect the data
Run this block first to see what you are working with.
Before merging, think: how many rows do you expect in the merged table? The sales
table has 12 rows and every product_id in sales exists in products, so a merge
on product_id should produce exactly 12 rows.
Step 2 — merge on product_id
Use pd.merge() with how="inner" and on="product_id". Verify the shape.
You now have product name and category alongside each sales figure. If your
shape is larger than (12, 5), check for duplicate product_id values in the
products table.
Step 3 — melt to long form for grouped analysis
The merged table is already reasonably long, but imagine you wanted to compare
units_sold across different metrics side by side. Practice the melt pattern by
treating period and product_name as identifiers and confirming the structure:
Step 4 — rolling 3-period average per product
Sort by product and period, then compute a rolling mean within each product
group. Because the data has only three periods, min_periods=1 lets the window
start as soon as there is one value:
The rolling average smooths out period-to-period noise. For products with only three data points the final period's rolling average equals the simple mean of all three periods — which is expected.
Step 5 — flag above-average periods
Use apply to add a boolean column. A period is "above average" if its
units_sold exceeds that product's mean across all periods:
The above_avg flag could also be written as a vectorised comparison:
merged["above_avg"] = merged["units_sold"] > merged["product_mean"].
That is faster and more idiomatic. The apply version here is intentional
practice — in real work, prefer the vectorised form when you can express it.
Done?
You have merged two datasets, reshaped between wide and long form, computed a
group-aware rolling average using transform, and added a derived column with
apply. These four operations together cover the core of the Reshaping and
Merging module. The next module — Visualisation — picks up here: with data
shaped correctly, you are ready to plot it.
Custom aggregations
Apply lambdas, group-normalise with transform, and chain multiple aggregation functions — all in runnable code.
Choosing a chart
Match your chart type to the question you are asking — bar for comparison, line for trend, scatter for relationship, histogram for distribution, box for spread.