Code of the Day
AdvancedProduction Go

Profiling

Find performance bottlenecks with go tool pprof — CPU and memory profiles, the net/http/pprof endpoint, and reading flame graphs.

GoAdvanced11 min read
Recommended first
By the end of this lesson you will be able to:
  • Generate a CPU profile with go test -cpuprofile
  • Generate a heap profile with go test -memprofile
  • Start the pprof HTTP endpoint in a running server
  • Use go tool pprof to explore profiles interactively and as flame graphs
  • Name three common CPU and memory hotspots to watch for

Go ships profiling support in the standard library and toolchain. You don't need third-party tools to find bottlenecks — and the net/http/pprof package are everything you need to go from "the service is slow" to "line 47 of store.go is the problem". This lesson is a practical guide to that workflow.

CPU profiling with go test

The fastest way to profile is through your test suite:

go test -bench=BenchmarkHot -cpuprofile=cpu.prof ./...
go tool pprof cpu.prof

Inside the pprof REPL, the most useful commands are:

(pprof) top10          # top 10 functions by CPU time
(pprof) list MyFunc    # annotated source for MyFunc
(pprof) web            # open flame graph in browser (requires graphviz)

top10 output looks like:

Showing nodes accounting for 3.2s, 89.44% of 3.58s total
  flat  flat%   sum%        cum   cum%
 1.45s 40.50% 40.50%      1.45s 40.50%  runtime.mallocgc
 0.82s 22.91% 63.41%      0.82s 22.91%  store.processRecord
  • flat: time spent in this function (excluding callees).
  • cum: cumulative time including callees.

When mallocgc dominates, you have excessive heap allocation. Look for functions with high flat time for CPU work, high cum time for call-chain bottlenecks.

Memory profiling

go test -bench=BenchmarkHot -memprofile=mem.prof ./...
go tool pprof -alloc_space mem.prof

Inside pprof:

(pprof) top10          # top allocating functions
(pprof) list MyFunc    # lines with allocation counts

Two profile types matter:

  • -alloc_space — total bytes allocated over the profile period (cumulative).
  • -inuse_space — bytes currently live (snapshot).

Start with -alloc_space when looking for GC pressure. Use -inuse_space when diagnosing a memory leak.

The net/http/pprof endpoint

For running services, import the side-effect-only package:

import _ "net/http/pprof"

This registers several endpoints on http.DefaultServeMux:

/debug/pprof/          — index page
/debug/pprof/profile   — 30-second CPU profile
/debug/pprof/heap      — heap snapshot
/debug/pprof/goroutine — all goroutine stacks

Collect a live CPU profile:

go tool pprof http://localhost:8080/debug/pprof/profile?seconds=30

Never expose the pprof endpoint on a public-facing port in production. It provides detailed information about your program's memory and goroutines that is a security risk. Bind it to a separate internal port or protect it with authentication.

Reading flame graphs

The web command in pprof generates a flame graph (requires Graphviz). A flame graph shows:

  • Width — proportion of time spent in a function (wider = more time).
  • Height — call stack depth.
  • Colour — no semantic meaning (just visual contrast).

The widest boxes at the top of a hot stack are your targets. A wide box near the top that is not your code usually indicates you're calling the standard library more than necessary.

Common hotspots to watch for

CPU:

  • runtime.mallocgc — you are allocating too much. Look for per-request slice/map creation that could be pooled.
  • runtime.memeqbody / string comparisons — tight loops comparing strings.
  • fmt.Sprintf in hot paths — string formatting allocates. Cache or pre-format where possible.

Memory:

  • Functions that allocate in a loop without reuse — use sync.Pool for expensive objects.
  • Large intermediate slices — stream or process in chunks.
  • bytes.Buffer growing unboundedly — pre-allocate with bytes.NewBuffer(make([]byte, 0, expectedSize)).

Profile before you optimise. Guessing at bottlenecks wastes time and often misses the real issue. Ten minutes with pprof is more valuable than an hour of speculative refactoring. And always re-benchmark after the change to verify the improvement.

Check your understanding

Knowledge check

  1. 1.
    A function shows low flat time but high cumulative (cum) time in pprof output. What does this indicate?
  2. 2.
    It is safe to expose the net/http/pprof endpoint on a public internet-facing port.
  3. 3.
    Your CPU profile shows runtime.mallocgc accounting for 40% of total time. What should you investigate?

Do it yourself

Add the pprof endpoint to a small HTTP server and collect a 5-second CPU profile:

# Terminal 1
go run main.go   # server with _ "net/http/pprof" imported

# Terminal 2
go tool pprof http://localhost:8080/debug/pprof/profile?seconds=5

Try the top5, list, and web commands.

Where to go next

You can now find bottlenecks. The final lesson covers building and deploying — compile flags, cross-compilation, static binaries, Docker multi-stage builds, and go generate.

Finished reading? Mark it complete to track your progress.

On this page