Week 7 - Dataclasses, attrs, Pydantic, and the Validation Boundary¶
7.1 Conceptual Core¶
- The single most important architectural decision in a typed Python codebase: where is the validation boundary? Internal types should be cheap (
@dataclass(slots=True, frozen=True)); boundary types (HTTP request bodies, message-bus payloads, LLM outputs) should validate (pydantic.BaseModel). - "Parse, don't validate." Once a value is past the boundary, it should be a typed object that cannot be malformed; checks afterward are dead code.
7.2 Mechanical Detail¶
dataclasses.dataclassparameters:frozen,slots,kw_only,eq,order,repr,match_args. Defaults that should befield(default_factory=list)- never bare[].attrs(the original): faster validators,evolve, slots by default. Still relevant;dataclasswon the stdlib slot butattrskeeps innovating.pydanticv2 (Rust core, ~10x faster than v1):BaseModel,Field(..., gt=0, le=100),model_validator,field_validator, discriminated unions,Annotated[..., AfterValidator(...)]. JSON schema export for free.TypedDict(PEP 589): for dict-shaped data with known keys (e.g., LLM tool-call payloads). Cheaper than Pydantic, no runtime validation. Pair withcastat the boundary or withpydantic.TypeAdapter.
7.3 Lab - "The Three-Layer Cake"¶
- Build an HTTP service (FastAPI, but kept small):
- Boundary layer: Pydantic
RequestModel/ResponseModel. - Domain layer:
@dataclass(slots=True, frozen=True)value objects. - Persistence layer:
TypedDictrows fromsqlite3. - Write explicit converters between each layer. Resist the urge to make them the same type.
- Benchmark a 10k-request loop with Pydantic v1 (if installed) vs. v2. Document the 10x.
7.4 Idiomatic & Linter Drill¶
- Enable
ruffD(pydocstyle). Document every public class and function. Enforce Google or NumPy docstring style.
7.5 Production Hardening Slice¶
- Add
schemathesisor property-based tests against your FastAPI app. Generate inputs from the OpenAPI schema; confirm 5xx never occurs on valid input shapes.