Lab: containerise a pipeline
Write a Dockerfile and compose.yaml for the hardened pipeline from Module 1, run it with Docker Compose, and verify that output files appear on the host filesystem.
- Write a Dockerfile for a Python pipeline script with correct layer ordering
- Write a compose.yaml that mounts an output volume and injects an API key secret
- Run the containerised pipeline with docker compose run and verify output
The pipeline from Module 1 is tested, hardened with checkpoints and retry logic,
and ready to ship. This lab packages it into a Docker image so it runs identically
on any machine with Docker installed — no Python version mismatch, no missing
tenacity install, no "works on my machine" surprises.
The project layout
Start with this directory structure:
my-pipeline/
pipeline.py # the hardened script from Module 1
requirements.txt
Dockerfile
compose.yaml
output/ # created by docker compose run; add to .gitignore
.env # local secrets; add to .gitignoreStep 1 — requirements.txt
tenacity==8.3.0
requests==2.32.3Pin exact versions. Floating requirements (tenacity>=8) produce non-reproducible
images: two builds a month apart can produce different package sets.
Step 2 — Dockerfile
# syntax=docker/dockerfile:1
FROM python:3.12-slim
WORKDIR /app
# Copy and install dependencies first (cache the expensive layer)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy source
COPY pipeline.py .
# Security: drop root
RUN useradd --system --no-create-home pipeline
USER pipeline
# Output directory (created at runtime via volume mount)
ENV OUTPUT_DIR=/app/output
ENTRYPOINT ["python", "pipeline.py"]Checkpoint 1 — build the image
docker build -t my-pipeline:dev .Confirm the build succeeds. Then change a comment in pipeline.py and rebuild.
Observe that layers 1–4 are cached and only layer 5 (COPY pipeline.py .)
rebuilds. The total rebuild time should be under two seconds.
Step 3 — compose.yaml
# compose.yaml
services:
pipeline:
build: .
environment:
API_KEY: "${API_KEY}"
OUTPUT_DIR: /app/output
volumes:
- ./output:/app/outputCheckpoint 2 — run with Compose
Create a .env file:
API_KEY=dev-placeholderThen run:
docker compose run --rm pipelineThe pipeline should execute and write its output to ./output/ on the host.
Verify:
ls -la output/
cat output/transformed.jsonIf the files are not there, check that the OUTPUT_DIR environment variable
matches the path used in pipeline.py and that the volume mount is correct.
If ./output does not exist, Docker creates it automatically as root on Linux,
which can cause permission errors when the container (running as the pipeline
user) tries to write to it. Pre-create the directory with mkdir -p output on
the host to avoid this.
Step 4 — pass the API key as a secret
In production you would not store real secrets in .env. Three common patterns:
CI/CD injection — the pipeline runs in GitHub Actions or another CI system that injects secrets as environment variables:
# .github/workflows/pipeline.yml (covered in Module 5)
env:
API_KEY: ${{ secrets.API_KEY }}Docker secret — for Docker Swarm or Compose v2 with secrets:
services:
pipeline:
secrets:
- api_key
environment:
API_KEY_FILE: /run/secrets/api_key
secrets:
api_key:
environment: "API_KEY" # reads from host env at deploy timeThen in pipeline.py, read os.environ.get("API_KEY") or open(os.environ["API_KEY_FILE"]).read().strip().
AWS/GCP/Azure secrets manager — the container IAM role grants permission to fetch the secret at startup.
For this lab, the .env approach is fine. Document the switch to a secrets manager
as a follow-up task.
Runnable demo
The demo below simulates the containerised pipeline execution — showing what you
would see in docker compose run output without requiring Docker to be installed.
Checkpoint 3 — multi-stage build (extension)
For production images, a multi-stage build separates the build environment from the runtime environment:
# Stage 1: install packages
FROM python:3.12-slim AS builder
WORKDIR /build
COPY requirements.txt .
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt
# Stage 2: runtime image
FROM python:3.12-slim
COPY --from=builder /install /usr/local
WORKDIR /app
COPY pipeline.py .
RUN useradd --system --no-create-home pipeline
USER pipeline
ENTRYPOINT ["python", "pipeline.py"]This keeps build-time tooling out of the runtime image. For most pipeline scripts the size reduction is modest, but it is the standard pattern for production-grade images.
Where to go next
Module complete. Next up: CI/CD for Automation — automate the Docker build and pipeline run inside GitHub Actions so every push to main triggers a tested, containerised deployment.
Docker Compose workflows
Docker Compose orchestrates multi-service environments in a single YAML file — define your pipeline, its database, and any supporting services together so the whole stack runs with one command.
GitHub Actions basics
A GitHub Actions workflow is a YAML file that runs jobs on triggers — understand workflows, triggers, jobs, and steps before writing your first automation.