Merge sort
Implement merge sort recursively, trace the divide-and-merge steps, and time it against insertion sort to make O(n log n) vs O(n²) tangible.
- Implement merge sort recursively using divide and merge steps
- Trace through how merge sort splits and recombines sub-arrays
- Verify O(n log n) empirically by timing against insertion sort on n=10000
Merge sort is the textbook divide-and-conquer sort. It splits the array in half, sorts each half recursively, then merges the two sorted halves into one. The merge step is the key insight — merging two sorted sequences is O(n).
Merge sort and insertion sort, timed
Run this to see O(n log n) versus O(n²) on the same data.
How the recursion works
For [5, 3, 8, 1]:
merge_sort([5, 3, 8, 1])
merge_sort([5, 3])
merge_sort([5]) → [5] (base case)
merge_sort([3]) → [3] (base case)
merge([5], [3]) → [3, 5]
merge_sort([8, 1])
merge_sort([8]) → [8]
merge_sort([1]) → [1]
merge([8], [1]) → [1, 8]
merge([3, 5], [1, 8]) → [1, 3, 5, 8]The depth of recursion is log₂(n) — you halve the array each time. At each level you do O(n) work across all the merge calls. Total: O(n log n).
The merge step in detail
merge(left, right) runs two pointers, one through each half, always
appending the smaller element to result. When one half is exhausted, the
remainder of the other is appended as-is (it is already sorted). This linear
pass — two pointers, one output — is what makes merge sort tractable.
This implementation creates new lists at every merge step, which costs O(n log n) extra memory. An in-place merge sort exists but is significantly more complex — it is rarely worth implementing unless memory is genuinely constrained. In Python, memory allocations are fast and the clarity of the version above is valuable.
Reading the timing output
At n=1 000 the difference may be small. At n=10 000 insertion sort should be roughly 100× slower than merge sort. The ratio grows with n — O(n²) grows much faster than O(n log n). Doubling n roughly doubles merge sort's time and quadruples insertion sort's.
Where to go next
Next: Python's sort — timsort, the key= parameter, sort stability, and
when to reach for operator.itemgetter instead of a lambda.