Calculating stats
Use pandas .mean(), .median(), .std(), and .value_counts() to compute summary statistics on a Series and DataFrame.
- Call .mean(), .median(), .std() on a numeric pandas Series
- Use .value_counts() to count occurrences of each category
- Apply column-level statistics to understand a DataFrame
The concepts from the previous lesson translate directly into pandas method calls.
Each statistic is one method call on a column (a Series). The code below builds
a small dataset of online course completions and asks several questions of it.
What each call returns
.mean() returns a single float — the arithmetic average. For score, this
is the sum of all scores divided by 8.
.median() returns the middle value. With 8 students (even count), pandas
averages the 4th and 5th sorted values. Compare it to the mean: if they are close,
the distribution is roughly symmetric.
.std() returns the standard deviation. A value of around 13 here means that
a typical student's score is about 13 points away from the mean — there is real
spread in the class.
.value_counts() counts how many times each distinct value appears in a
Series. Essential for categorical columns: here it tells you which track has the
most students. It returns the counts sorted descending by default, so the most
common category is first.
You can call .describe() to get mean, std, min, quartiles, and max all at once
on a column: df["score"].describe(). Use individual methods when you want to
embed a specific number in a calculation or comparison; use .describe() when
you want a quick overview.
Applying stats to the whole DataFrame
Call .mean() or .median() on the whole DataFrame (not a single column) and
pandas computes the statistic for every numeric column at once:
df[["score", "hours"]].mean()This produces a Series with one value per column — useful for a quick overview when your DataFrame has many numeric columns.
Where to go next
Next: grouping concepts — the split-apply-combine pattern that lets you calculate statistics within groups, which is where most real analysis starts.