Choosing a chart
Match your chart type to the question you are asking — bar for comparison, line for trend, scatter for relationship, histogram for distribution, box for spread.
- Match each of five core chart types to the question it answers
- Identify common chart mistakes that distort interpretation
- State the one question that determines which chart to use
Every chart answers a question. Pick the chart before you pick the library. The most common mistake in data visualisation is choosing a chart type first and finding a dataset to fit it — the result looks busy but says nothing.
The question that determines the chart: "What am I showing, and how are these values related to each other?" Five relationships cover the vast majority of analysis work.
Bar chart — comparison across categories
Use a bar chart when you are comparing a numeric value across discrete groups: revenue by product, headcount by department, average rating by country. The height of each bar encodes the value; the categories are on the axis.
A classic mistake: truncating the y-axis so it does not start at zero. A difference that looks dramatic in a truncated chart may be trivial in proportion to the actual values. If your axis starts at 90 and the bars sit at 91 and 99, you are hiding the scale.
Line chart — change over time
A line chart implies that the x-axis is a continuous or ordered sequence — typically time. The line connecting points signals that there is a meaningful relationship between adjacent values (each follows from the previous). Do not use a line chart for unordered categories; use a bar chart instead.
Scatter plot — relationship between two numeric variables
A scatter plot puts one numeric variable on each axis and draws a point per observation. It is the correct tool for asking "is there a relationship between X and Y?" — correlation, clusters, and outliers all become visible. Adding a third variable as point colour or size is useful but keep the encoding simple.
Histogram — distribution of a single variable
A histogram bins a continuous variable and shows how many observations fall in each bin. It answers "what does the spread of this variable look like?" — is it symmetric, skewed, bimodal, or does it have long tails? Use it before modelling to understand what you are working with.
Box plot — spread and outliers across groups
A box plot summarises a distribution compactly: the box spans the interquartile range (25th to 75th percentile), the line inside is the median, and the whiskers extend to the data range (with outliers plotted individually). Box plots are most useful when comparing the spread of the same variable across several groups.
| Chart type | Best question | Common mistake |
|---|---|---|
| Bar | How do categories compare? | Truncated y-axis |
| Line | How does a value change over time? | Using for unordered categories |
| Scatter | Is there a relationship between X and Y? | Overplotting with too many points |
| Histogram | What is the distribution of this variable? | Too few or too many bins |
| Box | How does spread compare across groups? | Omitting the sample size |
Pie charts are conspicuously absent from this list. They work when you have two or three slices and the proportional story matters — "half of sales came from one customer." With more than four slices, the human eye cannot reliably compare arc lengths. Use a bar chart instead.
Where to go next
Next: matplotlib essentials — creating figures with plt.subplots(), plotting
lines and bars, adding labels and legends, all in runnable code.