Memory profiling

Use tracemalloc to measure allocations, distinguish peak from current memory, and find common CLI memory hotspots.

Knowing that a streaming refactor reduces memory is useful. Knowing precisely which line allocates 90% of the memory is actionable. tracemalloc is the standard library's answer: it instruments the allocator and lets you ask, at any point, "what are the biggest allocations and where did they come from?"

How tracemalloc works

tracemalloc hooks into CPython's memory allocator. Every time Python allocates an object, tracemalloc records the size and the current call stack. You can take a snapshot at any point and query it:

import tracemalloc

tracemalloc.start()

# ... your code ...

snapshot = tracemalloc.take_snapshot()
stats = snapshot.statistics("lineno")

for stat in stats[:5]:
    print(stat)

Each stat line looks like:

my_tool/processor.py:42: size=15.2 MiB, count=100000, average=159 B

That tells you: line 42 of processor.py is responsible for 15 MB of live allocations, spread across 100,000 objects averaging 159 bytes each. That is almost certainly a list of strings.

Peak vs current memory

Two measurements matter:

Current — how much is allocated right now, at the moment of the snapshot. Useful for understanding steady-state usage.
Peak — the maximum current allocation since tracemalloc.start(). Useful for understanding worst-case usage.

A batch-processing tool might finish with near-zero current memory (everything has been processed and freed) but have a 2 GB peak if it buffered the entire input halfway through. The peak is what OOM-kills your process.

current, peak = tracemalloc.get_traced_memory()
print(f"Current: {current / 1024**2:.1f} MB")
print(f"Peak:    {peak / 1024**2:.1f} MB")
tracemalloc.stop()

Always check peak, not just current.

Common CLI memory hotspots

In order of frequency:

f.readlines() or f.read() — loading an entire file. Fix: iterate over the file object or use a generator.
List comprehensions over large iterables — [transform(x) for x in large_list] builds the full output list before you use any of it. Fix: use a generator expression (transform(x) for x in large_list).
Loading a full JSON or CSV file — json.load(f) parses the entire document into a nested dict/list structure. For JSON, use ijson for streaming; for CSV, iterate over csv.reader one row at a time.
Caching without a size limit — a dict used as a cache that grows unboundedly. Fix: use functools.lru_cache(maxsize=1024) or cachetools.LRUCache.
Intermediate result lists — accumulating results in a list, then iterating over the list, then discarding it. Fix: yield results as they are produced.

tracemalloc adds a few percent overhead at runtime. Enable it only when profiling — wrap the start/stop calls in a --profile-memory flag so users can opt in without paying the cost on every run.

Where to go next

Next: profiling in practice — a Runnable running tracemalloc on a list-buffering function and its generator replacement, side by side.

Finished reading? Mark it complete to track your progress.

How tracemalloc works

Peak vs current memory

Common CLI memory hotspots

Where to go next

On this page