Checkpoints and atomic writes
Implement the two core idempotency patterns in Python — checkpoint marker files and atomic write-then-rename — so your pipelines survive crashes and restarts cleanly.
- Implement a checkpoint file pattern using step_done() and mark_done() helpers
- Write output atomically using tempfile.NamedTemporaryFile and os.replace()
- Detect and skip already-completed steps when a pipeline restarts
The previous lesson defined idempotency and the two patterns that enforce it. Here you will run both patterns against a simulated multi-step pipeline and see how the checkpoint system skips completed steps when the pipeline is re-invoked.
The patterns in code
The step_done / mark_done helpers are intentionally simple — a .done file
per step is sufficient for most pipelines. The atomic write function wraps
tempfile and os.replace into a single reusable utility.
Run this and observe: every step prints its work message on the first pass, and every step is skipped on the second pass. The output files are identical.
What makes the write atomic
tempfile.NamedTemporaryFile(dir=destination.parent) creates the temp file in the
same directory as the destination. This is the critical detail: os.replace() is
only guaranteed atomic when the source and destination are on the same filesystem.
Writing to /tmp/ when the destination is on a mounted network share would break
atomicity.
Always pass dir=destination.parent, not a fixed /tmp path, unless you know
both paths live on the same filesystem. In the demo above the output is also in
/tmp so it works — but in production, put the output wherever the pipeline
expects it and the temp file will follow.
Adapting to your pipeline
Two things to parameterise when you use these patterns for real:
- Run ID in checkpoint names. If you run the pipeline multiple times per day,
include a date or run ID:
mark_done(f"{name}_{run_id}"). Otherwise all runs after the first will be no-ops. - Checkpoint location. Store checkpoints outside the output directory so that
clearing outputs does not also clear checkpoints. A
.checkpoints/directory at the project root works well.
Where to go next
Next: retry logic — checkpoints and atomic writes handle the "already done" case. Retrying transient failures handles the "not done yet, but worth trying again" case. Together they make a pipeline that is both safe to restart and resilient to intermittent errors.
Idempotency in practice
A script is production-ready only when running it twice leaves the world in the same state as running it once. Learn what idempotency means and the two patterns that enforce it.
Retry logic
Transient failures are a fact of life in networked pipelines. Learn when to retry, when not to, and how exponential backoff with jitter prevents a thundering herd.