iiRecord
Agentic AI Atlas · LLM Evaluation
skill-area:llm-evaluationa5c.ai
II.
SkillArea overview

skill-area:llm-evaluation

Reference · live

LLM Evaluation overview

Techniques for evaluating large language model quality, including automated benchmarks, human evaluation, and domain-specific metrics. Covers BLEU/ROUGE, LLM-as-judge, Elo rating, and eval harness design.

SkillAreaOutgoing · 4Incoming · 2

Attributes

displayName
LLM Evaluation
description
Techniques for evaluating large language model quality, including automated benchmarks, human evaluation, and domain-specific metrics. Covers BLEU/ROUGE, LLM-as-judge, Elo rating, and eval harness design.
expertiseLevels
  • intermediate
  • expert

Outgoing edges

applies_to2
prerequisite_for_learning2

Incoming edges

prerequisite_for_learning2