Lab: Scheduled job

Build a configured, logged script that fetches data from a public API, writes a JSON report, and is wired to run on a schedule — end-to-end practice for the scheduling and configuration module.

This lab ties together the whole intermediate tier: external data, configuration, logging, and scheduling. You will build a script that reads its configuration from an inline TOML block, fetches todo data from a public API, writes a JSON report, and logs every step. At the end you will see how the script would be wired to a recurring schedule.

The public API is the same one used in the external integrations lab: jsonplaceholder.typicode.com/todos.

Checkpoint 1: Configuration

Start with the config layer. The TOML config controls where output goes, which log level to use, and which user's todos to fetch:

Python — editable, runs in your browser

The force=True in basicConfig is needed in the runner to re-initialise logging between checkpoints. In a real script you call basicConfig once at startup and omit force.

Checkpoint 2: Fetch the data

Add the fetch step, using urllib (available in Pyodide without installation):

Python — editable, runs in your browser

User 1 has 20 todos. If you change user_id in the TOML config at the top, the fetch uses the new value — no code change needed.

Checkpoint 3: Build and write the report

Aggregate the todos and write a JSON report:

Python — editable, runs in your browser

import os
import tomllib
import logging
import urllib.request
import json
import io

TOML_CONFIG = """
log_level = "INFO"
output_path = "/tmp/todo_report.json"
user_id = 1
"""

config = tomllib.loads(TOML_CONFIG)
level = getattr(logging, config["log_level"].upper(), logging.INFO)
logging.basicConfig(level=level, format="%(levelname)s %(message)s", force=True)

def fetch_todos(user_id):
  url = f"https://jsonplaceholder.typicode.com/todos?userId={user_id}"
  logging.info("Fetching todos for user %d", user_id)
  with urllib.request.urlopen(url) as resp:
      todos = json.loads(resp.read().decode())
  logging.info("Fetched %d todos", len(todos))
  return todos

def build_report(todos, user_id):
  completed = [t for t in todos if t["completed"]]
  incomplete = [t for t in todos if not t["completed"]]
  return {
      "user_id": user_id,
      "total": len(todos),
      "completed": len(completed),
      "incomplete": len(incomplete),
      "completion_pct": round(100 * len(completed) / len(todos), 1) if todos else 0,
  }

def write_report(report, path):
  # Use io.StringIO to simulate writing to a file in the browser
  buf = io.StringIO()
  json.dump(report, buf, indent=2)
  logging.info("Report written to %s", path)
  return buf.getvalue()

todos = fetch_todos(config["user_id"])
report = build_report(todos, config["user_id"])
output = write_report(report, config["output_path"])
print(output)

The report structure is clean JSON that another script or API could consume. The completion_pct field adds a derived value — computed once here so consumers do not have to recalculate it.

In a production script, replace io.StringIO() with open(config["output_path"], "w"). Everything else stays the same. This is why testing with StringIO is valuable: the report logic is identical, only the output destination changes.

Checkpoint 4: Wire the schedule

Finally, add the schedule call that would drive the job in a real long-running process:

Python — editable, runs in your browser

In a deployed script the last block becomes:

if __name__ == "__main__":
    logging.info("Scheduler started")
    while True:
        schedule.run_pending()
        time.sleep(60)

The time.sleep(60) is generous — the job runs daily, so checking every minute is more than sufficient.

Putting it all together

The complete, production-ready script:

import os
import tomllib
import logging
import urllib.request
import json
import schedule
import time

TOML_CONFIG = """
log_level = "INFO"
output_path = "/tmp/todo_report.json"
user_id = 1
"""

def load_config():
    config = tomllib.loads(TOML_CONFIG)
    if os.environ.get("LOG_LEVEL"):
        config["log_level"] = os.environ["LOG_LEVEL"]
    if os.environ.get("USER_ID"):
        config["user_id"] = int(os.environ["USER_ID"])
    return config

def fetch_todos(user_id):
    url = f"https://jsonplaceholder.typicode.com/todos?userId={user_id}"
    logging.info("Fetching todos for user %d", user_id)
    with urllib.request.urlopen(url) as resp:
        todos = json.loads(resp.read().decode())
    logging.info("Fetched %d todos", len(todos))
    return todos

def build_report(todos, user_id):
    completed = [t for t in todos if t["completed"]]
    return {
        "user_id": user_id,
        "total": len(todos),
        "completed": len(completed),
        "incomplete": len(todos) - len(completed),
        "completion_pct": round(100 * len(completed) / len(todos), 1) if todos else 0,
    }

def write_report(report, path):
    with open(path, "w") as f:
        json.dump(report, f, indent=2)
    logging.info("Report written to %s", path)

def run_job(config):
    try:
        todos = fetch_todos(config["user_id"])
        report = build_report(todos, config["user_id"])
        write_report(report, config["output_path"])
    except Exception as e:
        logging.error("Job failed: %s", e)

def main():
    config = load_config()
    level = getattr(logging, config["log_level"].upper(), logging.INFO)
    logging.basicConfig(level=level, format="%(asctime)s %(levelname)s %(message)s")

    schedule.every().day.at("00:00").do(run_job, config=config)
    logging.info("Scheduler started — job runs daily at midnight")

    while True:
        schedule.run_pending()
        time.sleep(60)

if __name__ == "__main__":
    main()

Every layer of the intermediate tier is present: a public API call, TOML configuration with environment overrides, structured JSON output, logged activity, and a schedule that drives the whole thing.

Where to go next

You have completed the intermediate workflow tier. The patterns here — external integrations, subprocess pipelines, configuration, and scheduling — combine into the foundation of any serious automation system. The advanced tier builds on these with error handling strategies, distributed job queues, and observability tooling.

Finished reading? Mark it complete to track your progress.

Checkpoint 1: Configuration

Checkpoint 2: Fetch the data

Checkpoint 3: Build and write the report

Checkpoint 4: Wire the schedule

Putting it all together

Where to go next

On this page