Week 22 - Retrieval-Augmented Generation: Doing It Properly¶
22.1 Conceptual Core¶
RAG fails in seven places, and a senior engineer must know each:
- Ingestion: garbage-in (bad PDFs, lost layout, OCR errors).
- Chunking: too big → diluted relevance; too small → loss of context. Try semantic / recursive / sentence-window strategies; benchmark them.
- Embedding: model choice (
text-embedding-3-large,bge-large,nomic-embed,voyage-3), normalization, dimension, multilingual support. - Indexing: HNSW (
hnswlib,faiss,usearch), IVF-PQ for scale, keyword (bm25s,tantivy), hybrid (dense + sparse + reranker). - Retrieval: top-K, MMR for diversity, query rewriting / HyDE, query routing.
- Reranking: a cross-encoder reranker (
bge-reranker, Cohere rerank, Voyage rerank) on the top-50 → top-5. Often the single biggest quality win. - Prompting: how the chunks are presented, citation format, instructions for "don't answer if not in context."
22.2 Mechanical Detail¶
- Vector DBs:
pgvector(Postgres extension, the boring-and-correct choice),qdrant,weaviate,milvus,chroma(dev),lance/lancedb(good for local),turbopuffer(cheap, serverless). - Hybrid search:
RRF(reciprocal rank fusion) over dense + BM25. - Embedding pipelines with backpressure: don't OOM your provider, batch carefully, retry idempotently.
- Evals for RAG: retrieval recall@K, answer faithfulness (LLM-as-judge), answer relevance, context precision (
ragas,trulens, custom).
22.3 Lab - "End-to-End RAG with Honest Evals"¶
- Pick a corpus (your own docs, a Wikipedia subset, or a publicly available QA dataset). Ingest with at least two chunking strategies.
- Stand up
pgvectororqdrant. Index with two embedding models. - Implement hybrid retrieval (dense + BM25 + RRF) and add a reranker.
- Build a 50-question gold eval set with reference answers. Score with
ragas. Iterate retrieval until faithfulness > 0.85. - Plot the impact of each pipeline change in a results table. Resist the urge to tune blindly.
22.4 Production Hardening Slice¶
- Add eval-on-CI: every PR runs the gold set against the changed pipeline; regressions block merge.