BeginnerData Wrangling

Structured data concepts

CSV, JSON, and plain text each have a job. Knowing which format fits which problem — and what parsing actually means — is where data work starts.

WorkflowBeginner6 min read

By the end of this lesson you will be able to:

Distinguish CSV, JSON, and plain text as data formats and state when each is appropriate
Explain what parsing means in the context of data processing
Identify the structural difference between flat (tabular) and nested (hierarchical) data

Before you write a single line of data-processing code, you need to know what kind of data you are dealing with. Three formats cover the majority of automation work: CSV, JSON, and plain text. Each has a job; using the wrong one for the wrong job causes friction.

CSV — tabular data

CSV (Comma-Separated Values) represents data as a table: rows and columns. Each row is a line; each column is a comma-separated field. The first row is usually headers.

name,age,city
Alice,30,London
Bob,25,Berlin

CSV is the right choice when your data is flat and uniform — every record has the same set of fields, and none of them contain nested structure. Spreadsheets, database exports, and sensor readings are natural CSV.

The limitation: CSV does not represent hierarchy. A customer record that has multiple addresses cannot be cleanly expressed in a single CSV row without inventing awkward encoding conventions.

JSON — nested and hierarchical data

JSON (JavaScript Object Notation) uses key-value pairs and arrays, and it nests freely:

{
  "name": "Alice",
  "age": 30,
  "addresses": [
    { "type": "home", "city": "London" },
    { "type": "work", "city": "Oxford" }
  ]
}

JSON is the right choice when your data has hierarchy, optional fields, or variable structure. API responses are almost always JSON. Configuration files are often JSON (or its cousin YAML).

The limitation: JSON is more verbose than CSV for flat data, and it is harder to open in a spreadsheet.

Plain text — logs and freeform content

Plain text has no imposed structure. Each line might be a log entry, a command, a paragraph — whatever the producer decided to write. Plain text is appropriate when the data is not uniform enough for CSV or JSON, or when the consumer is a human reading a log rather than a program parsing records.

Working with plain text usually means writing code to find patterns: splitting on whitespace, searching for keywords, extracting fields with regular expressions.

What "parsing" means

Parsing is converting raw text (or bytes) into a data structure your program can work with. A CSV file is just a string until you parse it into a list of dictionaries. A JSON string is just text until you parse it into a Python dict.

raw text  →  parser  →  Python data structure

Parsers handle the messy details: quoting rules in CSV, escape sequences in JSON, line endings, encoding. You almost never write a parser from scratch — Python's standard library includes csv and json for exactly this.

"Serialisation" is the reverse of parsing: converting a Python data structure back into text for storage or transmission. json.dumps() serialises; json.loads() parses. The pair appears constantly in automation work.

Check your understanding

Knowledge check

Where to go next

Next: parsing JSON and CSV — using Python's json and csv modules to turn raw data strings into dictionaries you can filter and transform.

Finished reading? Mark it complete to track your progress.

Lab: Automate a task

Write a complete script that reads a CSV of tasks, filters by status, counts by category, and prints a summary report.

Parsing JSON and CSV

Python's built-in json and csv modules turn raw data strings into dictionaries you can filter, transform, and write back out.