Code of the Day
BeginnerData Fundamentals

Lab: explore a dataset

Apply inspection and cleaning end-to-end on a new dataset — no step-by-step instructions, just prompts and a starter block.

Lab · optionalData ScienceBeginner20 min
Recommended first
By the end of this lesson you will be able to:
  • Apply .head(), .shape, .dtypes, and .describe() to a new dataset
  • Identify and fix missing values, duplicates, and wrong types without guidance

This is an optional lab. No new concepts — just practice applying everything from the Data Fundamentals module to a dataset you have not seen before. Work through the prompts, run the code, and check that your cleaned DataFrame makes sense.

The dataset below is a small record of library book loans. It has all four data quality problems from the previous lesson. Your job is to find them and fix them.

The dataset

Python — editable, runs in your browser

Before changing anything, answer these questions by adding print() calls above:

  1. What is the shape of the DataFrame?
  2. What are the dtypes? Which columns are the wrong type?
  3. Does .describe() reveal anything suspicious?

Step 1 — check for missing values

Find out which rows have missing data. df.isnull().sum() tells you how many nulls are in each column. Then use .dropna() to remove those rows.

Python — editable, runs in your browser

Step 2 — remove duplicates

Loan 103 appears twice. Use .drop_duplicates() to keep only the first occurrence. Print the shape before and after.

Python — editable, runs in your browser

Step 3 — fix types and investigate the outlier

Convert days_on_loan to int. Then check the maximum — loan 105 has 999 days, which looks like a sentinel value. Filter it out with boolean indexing: df[df["days_on_loan"] < 100].

Python — editable, runs in your browser

After cleaning you have 5 rows from 8. You dropped 1 row with a missing member_id, 1 with a missing returned value, 1 duplicate, and 1 sentinel outlier. Every step was deliberate — you looked at the data, diagnosed the problem, and chose the appropriate fix.

Done?

You just ran a complete data inspection and cleaning pass — the same workflow a data scientist applies to every new dataset. The next module, Exploring Data, picks up here: with clean data in hand, you will calculate statistics and group your results to find patterns.

Finished reading? Mark it complete to track your progress.

On this page