Code of the Day
BeginnerData Fundamentals

Reading data files

Use Python's csv module and io.StringIO to open a CSV and read its rows as dictionaries.

Data ScienceBeginner8 min read
Recommended first
By the end of this lesson you will be able to:
  • Open a CSV with Python's csv module
  • Read rows as dictionaries using csv.DictReader
  • Understand what a file handle is and why io.StringIO substitutes for one

A (comma-separated values) file is the simplest form of structured data: plain text, one row per line, fields separated by commas. It is the lingua franca of data exchange — spreadsheets, databases, and APIs all export to it. Python has a built-in csv module that handles the parsing so you don't have to split on commas yourself (which breaks the moment a field contains a comma inside quotes).

The key class is csv.DictReader. It takes a file-like object and returns an iterator of dictionaries, one per row, where the keys are the column headers from the first line.

File handles and io.StringIO

Normally you would pass open("data.csv") to DictReader — that gives you a file handle, an object that reads bytes from disk one chunk at a time. In a browser-based runner there is no real file system, so this lesson uses io.StringIO instead. StringIO wraps a plain string and gives it the same interface as a file handle. From DictReader's point of view, the two are identical.

Python — editable, runs in your browser

Run it and look at what prints. Each row is a plain Python dict. The keys — "date", "product", "amount" — come from the first line of the CSV. The values are always str at this point; the csv module does not guess types.

Working with the rows

Because each row is a dict, you access fields by name:

for row in reader:
    product = row["product"]
    amount  = int(row["amount"])   # convert to int yourself
    print(f"{product}: {amount}")

You can collect all rows into a list with list(reader) if you need to pass them around or iterate more than once (the iterator is exhausted after one pass).

csv.DictReader respects quoted fields. If a field contains a comma — for example "Smith, Jane" — the module handles it correctly as a single value. This is the main reason to use the module rather than line.split(",").

Where to go next

Now you can get data into Python as a list of dicts. Next: data shapes — understanding the row-and-column structure that underpins every table, and why the orientation of your data matters.

Finished reading? Mark it complete to track your progress.

On this page