Docker Compose workflows
Docker Compose orchestrates multi-service environments in a single YAML file — define your pipeline, its database, and any supporting services together so the whole stack runs with one command.
- Explain when Docker Compose adds value over a single docker run command
- Define a service, volume mount, and environment variable in compose.yaml
- Use docker compose run to execute just the workflow service
A single docker run command is sufficient when your pipeline has one container.
Real pipelines often need more: a database to query, an SFTP server to pull files
from, or a local mock of an upstream API for integration testing. Running and
wiring these together with individual docker run commands becomes unwieldy fast.
Docker Compose describes the entire multi-container environment in a single
compose.yaml file and starts, stops, or runs any subset of it with one command.
A compose.yaml for a data pipeline
# compose.yaml
services:
# The pipeline itself
pipeline:
build: . # build from the Dockerfile in this directory
environment:
API_URL: http://mock-api/data # points at the mock service, not the real API
OUTPUT_DIR: /app/output
API_KEY: "${API_KEY}" # read from the host's environment
volumes:
- ./output:/app/output # persist pipeline output to the host
depends_on:
- mock-api
command: ["--env", "staging"] # overrides Dockerfile CMD; ENTRYPOINT unchanged
# Local mock API for integration testing
mock-api:
image: wiremock/wiremock:latest
ports:
- "8080:8080"
volumes:
- ./wiremock:/home/wiremock # WireMock stubs directory
# Postgres database (if the pipeline writes results to a DB)
db:
image: postgres:16-alpine
environment:
POSTGRES_PASSWORD: test
POSTGRES_DB: pipeline
ports:
- "5432:5432"Key concepts
Services are the individual containers. Each service maps to a docker run
invocation but with the networking and volumes already wired up.
Volumes have two forms:
./output:/app/output— a bind mount: a path on the host filesystem is mounted inside the container. Output files written to/app/outputinside the container appear immediately in./outputon the host.- A named volume (
output_data:/app/output) — managed by Docker and not tied to a specific host path. Better for data you do not need to access directly.
Environment variables accept "${VAR}" references, which are read from the
shell environment when docker compose runs. Never commit real secrets into
compose.yaml; pass them through environment variables or a .env file (added to
.gitignore).
depends_on ensures the mock-api container starts before pipeline.
Note that it waits for the container to start, not for the service inside to become
ready — add a healthcheck to the mock-api service if startup order matters for
correctness.
Running just the pipeline
# Start all services and run the pipeline to completion
docker compose run --rm pipeline
# Start only the database (e.g. for local development)
docker compose up db
# Tear everything down and remove volumes
docker compose down --volumesdocker compose run --rm pipeline starts mock-api and db first (because of
depends_on), runs the pipeline service to completion, removes the pipeline
container when it exits (but leaves mock-api and db running).
docker compose run is for one-off command execution — run a script, run tests,
run a migration. docker compose up is for long-running services. Pipelines
belong in run, not up.
Environment file
Store local defaults in a .env file at the project root:
# .env (add to .gitignore)
API_KEY=dev-key-not-real
OUTPUT_DIR=./outputCompose reads .env automatically. Production deployments inject real values via
the CI/CD pipeline or a secrets manager.
Where to go next
Next: lab — containerise — write a Dockerfile and compose.yaml for the
hardened pipeline from Module 1, run it with Docker Compose, and verify the output
files appear on the host filesystem.
Writing a Dockerfile
Write a production-quality Dockerfile for a Python automation script — from base image selection to ENTRYPOINT — with correct layer ordering to maximise cache reuse.
Lab: containerise a pipeline
Write a Dockerfile and compose.yaml for the hardened pipeline from Module 1, run it with Docker Compose, and verify that output files appear on the host filesystem.