Autocorrelation
Autocorrelation measures how strongly a series predicts its own future — and the ACF plot is the first tool to reach for when choosing a forecasting model.
- Define autocorrelation as the correlation between a series and its lagged self
- Interpret an ACF plot — distinguish trend decay from seasonal spikes
- Explain how ACF patterns inform the choice of ARIMA parameters
Ordinary correlation measures the relationship between two different variables. Autocorrelation measures the relationship between a variable and itself at a different point in time. It asks: if the value today is higher than usual, how much does that tell me about the value tomorrow? Next week? Next year?
The autocorrelation function (ACF)
For a time series x, the autocorrelation at lag k is the Pearson correlation between x(t) and x(t−k):
ACF(k) = Corr(x(t), x(t-k))At lag 0, the correlation is always 1.0 — a series is perfectly correlated with itself. At lag 1, it measures whether knowing today's value helps predict tomorrow's. At lag 12 (for monthly data), it measures whether the value from the same month last year is informative.
The ACF plot shows ACF(k) for k = 0, 1, 2, …, N. Bars extending outside the confidence bands (typically ±2/√n) are statistically significant.
Reading ACF patterns
Slow exponential decay across all lags indicates a trend. A trending series has high correlation at lag 1 (this month resembles last month), still significant correlation at lag 2 and beyond. The correlation decays slowly because trend carries information across long windows.
Spikes at regular intervals (lag 12, 24, 36 for monthly data; lag 7, 14 for daily data) indicate seasonality. The series is more similar to itself one period ago than to observations at non-seasonal lags.
Rapid decay to zero (significant only at lag 1, or not at all) indicates a stationary series with little memory — each observation is mostly noise relative to the previous one.
ACF and ARIMA parameter selection
The ARIMA family of models has three main parameters: p (autoregressive order), d (differencing order), and q (moving average order). The ACF, along with the partial ACF (PACF), provides the traditional way to choose these:
- If the ACF decays slowly: difference the series (increment d) until the decay becomes rapid.
- A sharp cut-off in the ACF at lag q, with significant spikes only up to that lag, suggests a moving-average term of order q.
- The PACF (not covered here) is the complementary diagnostic for the autoregressive order p.
At this level, automatic selection tools (pmdarima.auto_arima) handle the
parameter search, but understanding what the ACF tells you is necessary to
diagnose a poorly-fitting model.
statsmodels.graphics.tsaplots.plot_acf produces the ACF plot. In a
non-graphical environment, statsmodels.tsa.stattools.acf returns the
values as a numpy array — useful for printing or programmatic inspection.
Where to go next
Next: simple forecasting — implementing three forecasters (naïve, moving average, exponential smoothing) and measuring each with MAE to see how much the added complexity is worth.
Decomposing a time series
Use statsmodels seasonal_decompose to extract trend, seasonal, and residual components — and choose between additive and multiplicative models.
Simple forecasting methods
Implement naïve forecast, moving-average forecast, and exponential smoothing on the same series — then compare all three with MAE.