Structured data concepts
CSV, JSON, and plain text each have a job. Knowing which format fits which problem — and what parsing actually means — is where data work starts.
- Distinguish CSV, JSON, and plain text as data formats and state when each is appropriate
- Explain what parsing means in the context of data processing
- Identify the structural difference between flat (tabular) and nested (hierarchical) data
Before you write a single line of data-processing code, you need to know what kind of data you are dealing with. Three formats cover the majority of automation work: CSV, JSON, and plain text. Each has a job; using the wrong one for the wrong job causes friction.
CSV — tabular data
CSV (Comma-Separated Values) represents data as a table: rows and columns. Each row is a line; each column is a comma-separated field. The first row is usually headers.
name,age,city
Alice,30,London
Bob,25,BerlinCSV is the right choice when your data is flat and uniform — every record has the same set of fields, and none of them contain nested structure. Spreadsheets, database exports, and sensor readings are natural CSV.
The limitation: CSV does not represent hierarchy. A customer record that has multiple addresses cannot be cleanly expressed in a single CSV row without inventing awkward encoding conventions.
JSON — nested and hierarchical data
JSON (JavaScript Object Notation) uses key-value pairs and arrays, and it nests freely:
{
"name": "Alice",
"age": 30,
"addresses": [
{ "type": "home", "city": "London" },
{ "type": "work", "city": "Oxford" }
]
}JSON is the right choice when your data has hierarchy, optional fields, or variable structure. API responses are almost always JSON. Configuration files are often JSON (or its cousin YAML).
The limitation: JSON is more verbose than CSV for flat data, and it is harder to open in a spreadsheet.
Plain text — logs and freeform content
Plain text has no imposed structure. Each line might be a log entry, a command, a paragraph — whatever the producer decided to write. Plain text is appropriate when the data is not uniform enough for CSV or JSON, or when the consumer is a human reading a log rather than a program parsing records.
Working with plain text usually means writing code to find patterns: splitting on whitespace, searching for keywords, extracting fields with regular expressions.
What "parsing" means
Parsing is converting raw text (or bytes) into a data structure your program can work with. A CSV file is just a string until you parse it into a list of dictionaries. A JSON string is just text until you parse it into a Python dict.
raw text → parser → Python data structureParsers handle the messy details: quoting rules in CSV, escape sequences in JSON,
line endings, encoding. You almost never write a parser from scratch — Python's
standard library includes csv and json for exactly this.
"Serialisation" is the reverse of parsing: converting a Python data structure
back into text for storage or transmission. json.dumps() serialises; json.loads()
parses. The pair appears constantly in automation work.
Check your understanding
Knowledge check
- 1.You need to store a list of customers, where each customer can have zero or more phone numbers. Which format fits best?
- 2.What does parsing mean in data processing?
- 3.CSV is well suited for storing deeply nested, hierarchical data.
Where to go next
Next: parsing JSON and CSV — using Python's json and csv modules to turn
raw data strings into dictionaries you can filter and transform.