Skip to content

Week 22 - Retrieval-Augmented Generation: Doing It Properly

22.1 Conceptual Core

RAG fails in seven places, and a senior engineer must know each:

  1. Ingestion: garbage-in (bad PDFs, lost layout, OCR errors).
  2. Chunking: too big → diluted relevance; too small → loss of context. Try semantic / recursive / sentence-window strategies; benchmark them.
  3. Embedding: model choice (text-embedding-3-large, bge-large, nomic-embed, voyage-3), normalization, dimension, multilingual support.
  4. Indexing: HNSW (hnswlib, faiss, usearch), IVF-PQ for scale, keyword (bm25s, tantivy), hybrid (dense + sparse + reranker).
  5. Retrieval: top-K, MMR for diversity, query rewriting / HyDE, query routing.
  6. Reranking: a cross-encoder reranker (bge-reranker, Cohere rerank, Voyage rerank) on the top-50 → top-5. Often the single biggest quality win.
  7. Prompting: how the chunks are presented, citation format, instructions for "don't answer if not in context."

22.2 Mechanical Detail

  • Vector DBs: pgvector (Postgres extension, the boring-and-correct choice), qdrant, weaviate, milvus, chroma (dev), lance/lancedb (good for local), turbopuffer (cheap, serverless).
  • Hybrid search: RRF (reciprocal rank fusion) over dense + BM25.
  • Embedding pipelines with backpressure: don't OOM your provider, batch carefully, retry idempotently.
  • Evals for RAG: retrieval recall@K, answer faithfulness (LLM-as-judge), answer relevance, context precision (ragas, trulens, custom).

22.3 Lab - "End-to-End RAG with Honest Evals"

  1. Pick a corpus (your own docs, a Wikipedia subset, or a publicly available QA dataset). Ingest with at least two chunking strategies.
  2. Stand up pgvector or qdrant. Index with two embedding models.
  3. Implement hybrid retrieval (dense + BM25 + RRF) and add a reranker.
  4. Build a 50-question gold eval set with reference answers. Score with ragas. Iterate retrieval until faithfulness > 0.85.
  5. Plot the impact of each pipeline change in a results table. Resist the urge to tune blindly.

22.4 Production Hardening Slice

  • Add eval-on-CI: every PR runs the gold set against the changed pipeline; regressions block merge.

Comments