Skip to content

Week 20 - Observability, FastAPI, Production Service Shape

20.1 Conceptual Core

A production Python service has, at minimum: structured logs, metrics, distributed traces, health checks, graceful shutdown, configuration via env, secrets via a vault, and a dependency-injection seam for tests. None are optional.

20.2 Mechanical Detail

  • Logging: logging stdlib, configured once at startup, JSON formatter (python-json-logger or structlog). Attach request_id, user_id, trace_id via ContextVar.
  • Metrics: prometheus_client for pull; OpenTelemetry metrics for push. Histograms over averages - averages lie about tail latency.
  • Traces: opentelemetry-api + opentelemetry-sdk + opentelemetry-instrumentation-fastapi/-httpx/-sqlalchemy. Auto-instrumentation gets you 80% for free.
  • FastAPI: ASGI app, Pydantic-typed request/response, dependencies as Annotated[T, Depends(...)], lifespan context manager for startup/shutdown, BackgroundTasks only for fire-and-forget (use a real queue for durable work).
  • Configuration: pydantic-settings reading .env + env vars. Never hardcode. Never read env vars directly in domain code.
  • Graceful shutdown: SIGTERM → drain in-flight → close DB pools → exit. uvicorn --graceful-timeout.
  • Health: /healthz (liveness, returns 200 if the process is up) vs. /readyz (readiness, returns 200 only if dependencies are reachable). Distinct, not interchangeable.

20.3 Lab - "Production-Shaped Service"

Build a FastAPI service that: 1. Accepts a POST /jobs, persists to SQLite, returns a job ID. 2. Processes jobs in an asyncio.TaskGroup background worker with bounded concurrency. 3. Emits structured JSON logs with trace correlation. 4. Exposes /metrics (Prometheus) and /healthz//readyz. 5. Handles SIGTERM by draining in-flight jobs. 6. Runs under uvicorn with --workers 4 (multi-process). Document why workers > 1 for CPU-light I/O-bound services on stock CPython. 7. Has a docker-compose stack including Prometheus, Grafana, and Jaeger. 8. Has a k6 or locust load test in loadtest/ reproducing the latency SLO.

20.4 Idiomatic & Linter Drill

  • Add ruff LOG rules. Catch logger.info(f"...") (use % formatting for lazy interpolation).

20.5 Production Hardening Slice

  • Deploy to a free-tier cloud (Fly.io, Render, or a Hetzner VM with Caddy). Run for a week, watch the dashboard, write a one-page postmortem of what the dashboard taught you.

Month-5 Exit Criteria

Before starting Month 6:

  1. Translate any GoF pattern to its Pythonic form, or argue it doesn't apply.
  2. Pick the right data structure from the menu without defaulting to dict/list.
  3. Ship a FastAPI service with full observability and graceful shutdown in under a day.
  4. Defend a hexagonal architecture in a code review.

Comments