Week 20 - Observability, FastAPI, Production Service Shape¶
20.1 Conceptual Core¶
A production Python service has, at minimum: structured logs, metrics, distributed traces, health checks, graceful shutdown, configuration via env, secrets via a vault, and a dependency-injection seam for tests. None are optional.
20.2 Mechanical Detail¶
- Logging:
loggingstdlib, configured once at startup, JSON formatter (python-json-loggerorstructlog). Attachrequest_id,user_id,trace_idviaContextVar. - Metrics:
prometheus_clientfor pull; OpenTelemetry metrics for push. Histograms over averages - averages lie about tail latency. - Traces:
opentelemetry-api+opentelemetry-sdk+opentelemetry-instrumentation-fastapi/-httpx/-sqlalchemy. Auto-instrumentation gets you 80% for free. - FastAPI: ASGI app, Pydantic-typed request/response, dependencies as
Annotated[T, Depends(...)], lifespan context manager for startup/shutdown,BackgroundTasksonly for fire-and-forget (use a real queue for durable work). - Configuration:
pydantic-settingsreading.env+ env vars. Never hardcode. Never read env vars directly in domain code. - Graceful shutdown: SIGTERM → drain in-flight → close DB pools → exit.
uvicorn --graceful-timeout. - Health:
/healthz(liveness, returns 200 if the process is up) vs./readyz(readiness, returns 200 only if dependencies are reachable). Distinct, not interchangeable.
20.3 Lab - "Production-Shaped Service"¶
Build a FastAPI service that:
1. Accepts a POST /jobs, persists to SQLite, returns a job ID.
2. Processes jobs in an asyncio.TaskGroup background worker with bounded concurrency.
3. Emits structured JSON logs with trace correlation.
4. Exposes /metrics (Prometheus) and /healthz//readyz.
5. Handles SIGTERM by draining in-flight jobs.
6. Runs under uvicorn with --workers 4 (multi-process). Document why workers > 1 for CPU-light I/O-bound services on stock CPython.
7. Has a docker-compose stack including Prometheus, Grafana, and Jaeger.
8. Has a k6 or locust load test in loadtest/ reproducing the latency SLO.
20.4 Idiomatic & Linter Drill¶
- Add
ruffLOGrules. Catchlogger.info(f"...")(use%formatting for lazy interpolation).
20.5 Production Hardening Slice¶
- Deploy to a free-tier cloud (Fly.io, Render, or a Hetzner VM with Caddy). Run for a week, watch the dashboard, write a one-page postmortem of what the dashboard taught you.
Month-5 Exit Criteria¶
Before starting Month 6:
- Translate any GoF pattern to its Pythonic form, or argue it doesn't apply.
- Pick the right data structure from the menu without defaulting to
dict/list. - Ship a FastAPI service with full observability and graceful shutdown in under a day.
- Defend a hexagonal architecture in a code review.