RAG Pipeline Health Check

You shipped a RAG pipeline. Answers are hit-or-miss. You want someone who's done this before to walk through your setup and call the shots — embedding model, chunk size, retrieval strategy, eval harness — and tell you what to fix first.

Book the hour

What gets covered

Embedding model

Token cap vs your real chunk distribution (Nomic capping at 2048 silently, OpenAI ada-002 vs text-embedding-3-large vs the open-source alternatives), domain fit, dimension/storage trade-offs.

Chunking strategy

Token-vs-character, semantic vs fixed-size, overlap, table/code handling. The chunker is the biggest unforced error in most RAG pipelines.

Retrieval method

Vector-only vs hybrid (BM25 + vector + rerank), top-k tuning, metadata filtering. Whether you actually need a reranker or whether your top-1 is fine.

Vector store

pgvector vs Pinecone vs Qdrant vs Azure AI Search. HNSW parameters. Index type for your scale. When you should switch and when you should stop fiddling.

Eval harness

Whether you have one. If not, the lightest-weight one that catches regressions: a question set, ground-truth chunks, recall@k. If yes, whether it's measuring what you actually care about.

Generation glue

Prompt template, citation enforcement, refusal handling when retrieval is weak, hallucination guardrails.

Deliverable

Punch list

Written, ordered by impact

Each item names the symptom, the cause, the fix, and the rough effort. Sent within 24 hours of the call.

Eval-set recommendation

If you don't have one

A minimum eval set sized to your domain — question count, ground-truth shape, what to measure first.

Start

Engage: contact@althor.dev
Price: $300 fixed · invoiced on booking
Timeline: Booked within 5 business days · punch list delivered within 24 hours of call