Lab: Scheduled job
Build a configured, logged script that fetches data from a public API, writes a JSON report, and is wired to run on a schedule — end-to-end practice for the scheduling and configuration module.
- Read a TOML config with tomllib and override with environment variables
- Fetch data from a public API using urllib
- Write a JSON report to a file (simulated with io.StringIO)
- Log activity with Python's logging module
- Define a schedule with the schedule library
This lab ties together the whole intermediate tier: external data, configuration, logging, and scheduling. You will build a script that reads its configuration from an inline TOML block, fetches todo data from a public API, writes a JSON report, and logs every step. At the end you will see how the script would be wired to a recurring schedule.
The public API is the same one used in the external integrations lab:
jsonplaceholder.typicode.com/todos.
Checkpoint 1: Configuration
Start with the config layer. The TOML config controls where output goes, which log level to use, and which user's todos to fetch:
The force=True in basicConfig is needed in the runner to re-initialise logging
between checkpoints. In a real script you call basicConfig once at startup and omit
force.
Checkpoint 2: Fetch the data
Add the fetch step, using urllib (available in Pyodide without installation):
User 1 has 20 todos. If you change user_id in the TOML config at the top, the fetch
uses the new value — no code change needed.
Checkpoint 3: Build and write the report
Aggregate the todos and write a JSON report:
The report structure is clean JSON that another script or API could consume. The
completion_pct field adds a derived value — computed once here so consumers do not
have to recalculate it.
In a production script, replace io.StringIO() with open(config["output_path"], "w").
Everything else stays the same. This is why testing with StringIO is valuable:
the report logic is identical, only the output destination changes.
Checkpoint 4: Wire the schedule
Finally, add the schedule call that would drive the job in a real long-running
process:
In a deployed script the last block becomes:
if __name__ == "__main__":
logging.info("Scheduler started")
while True:
schedule.run_pending()
time.sleep(60)The time.sleep(60) is generous — the job runs daily, so checking every minute is
more than sufficient.
Putting it all together
The complete, production-ready script:
import os
import tomllib
import logging
import urllib.request
import json
import schedule
import time
TOML_CONFIG = """
log_level = "INFO"
output_path = "/tmp/todo_report.json"
user_id = 1
"""
def load_config():
config = tomllib.loads(TOML_CONFIG)
if os.environ.get("LOG_LEVEL"):
config["log_level"] = os.environ["LOG_LEVEL"]
if os.environ.get("USER_ID"):
config["user_id"] = int(os.environ["USER_ID"])
return config
def fetch_todos(user_id):
url = f"https://jsonplaceholder.typicode.com/todos?userId={user_id}"
logging.info("Fetching todos for user %d", user_id)
with urllib.request.urlopen(url) as resp:
todos = json.loads(resp.read().decode())
logging.info("Fetched %d todos", len(todos))
return todos
def build_report(todos, user_id):
completed = [t for t in todos if t["completed"]]
return {
"user_id": user_id,
"total": len(todos),
"completed": len(completed),
"incomplete": len(todos) - len(completed),
"completion_pct": round(100 * len(completed) / len(todos), 1) if todos else 0,
}
def write_report(report, path):
with open(path, "w") as f:
json.dump(report, f, indent=2)
logging.info("Report written to %s", path)
def run_job(config):
try:
todos = fetch_todos(config["user_id"])
report = build_report(todos, config["user_id"])
write_report(report, config["output_path"])
except Exception as e:
logging.error("Job failed: %s", e)
def main():
config = load_config()
level = getattr(logging, config["log_level"].upper(), logging.INFO)
logging.basicConfig(level=level, format="%(asctime)s %(levelname)s %(message)s")
schedule.every().day.at("00:00").do(run_job, config=config)
logging.info("Scheduler started — job runs daily at midnight")
while True:
schedule.run_pending()
time.sleep(60)
if __name__ == "__main__":
main()Every layer of the intermediate tier is present: a public API call, TOML configuration with environment overrides, structured JSON output, logged activity, and a schedule that drives the whole thing.
Where to go next
You have completed the intermediate workflow tier. The patterns here — external integrations, subprocess pipelines, configuration, and scheduling — combine into the foundation of any serious automation system. The advanced tier builds on these with error handling strategies, distributed job queues, and observability tooling.
Config and logging
Read a TOML config, override values with environment variables, and wire the result into Python's logging module — the complete pattern for observable, configurable scripts.
Idempotency in practice
A script is production-ready only when running it twice leaves the world in the same state as running it once. Learn what idempotency means and the two patterns that enforce it.