Data structures: choosing the right shape
The shape you store data in decides what your code can do cheaply — and what it can't.
- Describe the everyday structures and their access patterns
- Choose a structure from the operations a problem needs
- Explain why the wrong structure makes simple code slow
A data structure is a way of organising data so that certain operations are cheap. Pick the right one and your code is simple and fast; pick the wrong one and you'll write convoluted code that's slow on top. The choice is driven by one question: what operations does this problem need most?
The everyday structures
You'll reach for these constantly:
- Array / list — an ordered sequence. Great for "keep things in order" and "visit them all." Cheap to read by position; getting longer is cheap at the end, costlier in the middle.
- Map / dictionary — key→value pairs. Built for "look something up by its key" — near-instant, regardless of size. The workhorse of fast lookups.
- Set — a collection of unique items with fast "is this in here?" checks. Perfect for de-duplication and membership tests.
- Stack / queue — ordered with a discipline: a stack is last-in-first-out (undo history), a queue is first-in-first-out (a job line).
- Tree / graph — values connected by relationships: hierarchies (a file system) and networks (who-follows-whom).
Structure determines cost
The same task can be trivial or painful depending on structure. "Does this collection contain X?"
- In a list, you may have to check every element — slow as it grows.
- In a set or map, it's effectively instant, at any size.
That difference isn't a micro-optimisation; on large data it's the gap between "returns immediately" and "times out." The structure, not the cleverness of the code, decides.
Let the operations choose
Don't start from the structure; start from what you'll do:
- Mostly looking things up by a key? → map.
- Need order and to iterate through everything? → list.
- Need uniqueness and fast membership? → set.
- Modelling relationships/hierarchy? → tree/graph.
Write down the two or three operations the problem does most often, then pick the structure that makes those cheap. The rest follows.
A surprising amount of "make it faster" work is really "use the right structure." Before optimising loops, ask whether a list should have been a map.
Where to go next
To reason precisely about "cheap" and "slow," we need a way to measure how cost grows. That's algorithms and complexity, next.