Code of the Day
AdvancedContainerised Workflows

Docker concepts

Images are frozen environments, containers are running instances, and layers make rebuilds fast — understand the model before writing a single Dockerfile.

WorkflowAdvanced7 min read
By the end of this lesson you will be able to:
  • Explain the difference between an image and a container
  • Describe the layer model and why it makes rebuilds fast
  • Articulate why containers solve the reproducibility problem for automation scripts

"It works on my machine" is a statement that automation pipelines cannot afford. A script that ran fine on your laptop may fail in production because of a different Python version, a missing system library, or a subtly different locale setting. Docker solves this by packaging the environment alongside the code.

Images vs containers

A Docker image is a read-only, self-contained snapshot of everything a process needs to run: the operating system base, system libraries, the Python interpreter, your installed packages, and your source code. An image does not run; it is an artifact — like a shipping container before it is loaded onto a vessel.

A container is a running instance of an image. Starting a container from an image is fast (milliseconds) because no new files need to be created — the container layers a thin, writable filesystem on top of the read-only image. Stop the container and that writable layer disappears; the image is untouched.

You can run ten containers from the same image simultaneously, each isolated from the others.

The layer model

Every instruction in a Dockerfile produces a new layer. A layer is a diff against the previous state — only the files changed by that instruction.

FROM python:3.12-slim       # Layer 1: base OS + Python interpreter
WORKDIR /app                # Layer 2: set working directory (tiny)
COPY requirements.txt .     # Layer 3: add requirements file
RUN pip install -r requirements.txt  # Layer 4: install packages
COPY . .                    # Layer 5: add source code

Docker caches layers. When you rebuild, Docker reuses every cached layer up to the first instruction whose inputs changed. If you only changed pipeline.py, Docker reuses layers 1–4 (the expensive pip install) and rebuilds only layer 5.

This is why the standard pattern copies requirements.txt before copying the rest of the source: the package install layer changes rarely, so it stays cached across most rebuilds.

The cache key for a COPY instruction is a hash of the files being copied. Changing a single byte in requirements.txt invalidates layer 3 and everything after it, triggering a full pip install. This is correct — you changed the dependencies — and fast on the next build if you revert.

Why this solves the reproducibility problem

A Docker image pins every dependency: the Python version, the exact versions of every package in requirements.txt, the C libraries linked by those packages, and the OS baseline. The same image tag produces the same behaviour on:

  • Your laptop (macOS or Linux).
  • A CI runner (Ubuntu 22.04).
  • A production server (Debian 12).
  • A colleague's machine with a different Python installation.

For automation pipelines, this matters especially because pipelines run unattended. A Python 3.9 behaviour difference in datetime.fromisoformat caused a production pipeline to fail silently for months before anyone noticed — a containerised pipeline pinned to 3.12 would have surfaced the issue the first time someone ran it with the wrong Python.

Key terms

TermMeaning
ImageRead-only, built artifact — the frozen environment
ContainerRunning instance of an image
LayerOne instruction's diff in the image filesystem
RegistryStorage for images (Docker Hub, GHCR, ECR)
TagHuman-readable label for an image version (myapp:1.4.2)
DockerfileRecipe that builds an image from instructions

Where to go next

Next: writing a Dockerfile — the minimal Dockerfile for a Python automation script, with best practices for layer ordering and the difference between CMD and ENTRYPOINT.

Finished reading? Mark it complete to track your progress.

On this page