displayName
RAG Pipeline Evaluation
workflowKind
data
triggerType
event-driven
typicalCadence
per-release
complexity
cross-team
description
Evaluates retrieval-augmented generation pipeline quality end-to-end —
measuring retrieval precision@k and recall@k against ground-truth relevance
judgments, scoring generation faithfulness and hallucination rates via LLM
judges, benchmarking chunking and embedding strategies, profiling latency
across retrieval and generation stages, and running regression suites before
promoting pipeline changes. Excludes RAG architecture design and embedding
model training.