Scaling strategies
Vertical vs horizontal, and why statelessness is what makes scaling out possible.
- Compare vertical and horizontal scaling
- Explain why stateless services scale out easily
- Recognise the database as the usual scaling bottleneck
Once measurement shows you genuinely need more capacity, there are two directions to grow — and the choice shapes your whole architecture.
Up vs out
- Vertical scaling (scale up): give one machine more power — more CPU, RAM, faster disk. Simple, requires no code changes, but has a ceiling (and big machines get expensive fast) and remains a single point of failure.
- Horizontal scaling (scale out): add more machines and share the load across them with a load balancer. Near-limitless and more resilient (lose one of many), but only works if your application is built for it.
Scaling up is the easy first move; scaling out is how large systems actually handle large load.
Statelessness is the key
Horizontal scaling depends on one property: a request can be handled by any instance. That requires services to be stateless — keeping no per-user data in their own memory (the consistency lesson). Push state out to a shared store, and you can add or remove identical instances freely behind a load balancer.
If instead an instance remembers things (a user's session in local memory), the load balancer must pin that user to that instance ("sticky sessions"), and losing it loses their state. Statelessness is what makes scaling out clean.
The database is usually the wall
You can add app servers easily; the data layer is typically what's hard to scale, because state is exactly what's hard to distribute. Common moves, in rough order of complexity:
- Read replicas — copies that serve reads, taking load off the primary.
- Caching — serve hot data without hitting the database at all (the caching lesson).
- Sharding — partition the data across multiple databases by some key. Powerful but a major step up in complexity.
Don't scale out prematurely — a single beefy server plus caching carries most applications a remarkably long way, with a fraction of the complexity. Scale out when measurement, not anxiety, says you must.
Where to go next
Last lesson: how to know your scaling actually works, by testing under realistic load and bottlenecks.