Code of the Day
AdvancedWorkflow Orchestration

Makefiles for pipelines

Make is the oldest widely-used DAG executor — its target/dependency syntax maps directly onto data pipeline DAGs, with automatic incremental rebuilds and parallel execution built in.

WorkflowAdvanced10 min read
Recommended first
By the end of this lesson you will be able to:
  • Write a Makefile with targets and file-based dependencies for a data pipeline
  • Use .PHONY targets for tasks that do not produce files
  • Run make with the -j flag for parallel execution of independent targets

Make was created in 1976 to automate C compilation, but its model — targets with declared dependencies, rebuilt only when inputs are newer than outputs — maps perfectly onto data pipelines. Unlike a cron job or a sequential shell script, Make understands the dependency graph and executes only what is necessary.

A Makefile for a data pipeline

# Makefile for a three-stage pipeline
# Targets are output files; prerequisites are input files.

DATA_DIR  := data
REPORT    := reports/summary.json

# Stage 1: fetch raw data from API (produces data/raw.json)
$(DATA_DIR)/raw.json:
	python scripts/fetch.py --output $@

# Stage 2: clean and validate (produces data/clean.csv)
$(DATA_DIR)/clean.csv: $(DATA_DIR)/raw.json
	python scripts/clean.py --input $< --output $@

# Stage 3: aggregate and report (produces reports/summary.json)
$(REPORT): $(DATA_DIR)/clean.csv
	mkdir -p reports
	python scripts/report.py --input $< --output $@

# Phony targets do not produce files — they trigger other targets
.PHONY: all clean help

all: $(REPORT)

clean:
	rm -rf $(DATA_DIR) reports

help:
	@echo "Targets: all  clean  help"

The automatic variables $@ (the target) and $< (the first prerequisite) keep the rules DRY — change the filename variables at the top and every rule adapts.

How Make resolves the graph

When you run make all, Make:

  1. Reads the Makefile and builds the full dependency graph.
  2. Checks each target's modification time against its prerequisites' modification times.
  3. If a prerequisite is newer than the target (or the target does not exist), it rebuilds that target and all targets that depend on it.
  4. Steps with no shared dependencies are candidates for parallel execution.

This means: if data/raw.json already exists and has not changed, make all will skip the fetch step and run only the stages whose inputs are newer than their outputs. This is incremental building — a pipeline re-run after a small upstream change only re-computes what changed.

The phony targets all and clean do not produce files named all or clean. Declare them in .PHONY so Make does not mistake a file with that name for an up-to-date target. Any target that runs commands rather than producing a file should be phony.

Parallel execution with -j

make -j4 all

The -j4 flag allows Make to run up to four targets simultaneously. If clean and enrich are both prerequisites of report but independent of each other, Make will start both in parallel without any extra configuration.

Visualising the graph

# Requires graphviz and makefile2graph (pip install makefile2graph)
make -Bnd | make2graph | dot -Tpng -o pipeline.png

This renders the dependency graph as an image — useful for documenting complex pipelines or spotting unintended serial dependencies.

Limitations

Make is file-centric: it tracks freshness by comparing timestamps, which breaks down when outputs are not files (database writes, API calls). It also has no concept of retries, logging, or a UI for monitoring run history.

For pipelines that outgrow Makefiles, Prefect (the next lesson) picks up where Make leaves off.

Makefile recipes must be indented with a hard tab, not spaces. This is the single most common source of "Nothing to be done for 'all'" or "missing separator" errors. Configure your editor to preserve tabs in .mk and Makefile files.

Where to go next

Next: Prefect concepts — Prefect brings everything Make lacks: retries, a web UI, parametrised runs, and scheduling — while keeping the pipeline defined in ordinary Python.

Finished reading? Mark it complete to track your progress.

On this page