II.
SkillArea overview
Reference · liveskill-area:llm-evaluation
LLM Evaluation overview
Techniques for evaluating large language model quality, including automated benchmarks, human evaluation, and domain-specific metrics. Covers BLEU/ROUGE, LLM-as-judge, Elo rating, and eval harness design.
Attributes
displayName
LLM Evaluation
description
Techniques for evaluating large language model quality, including
automated benchmarks, human evaluation, and domain-specific metrics.
Covers BLEU/ROUGE, LLM-as-judge, Elo rating, and eval harness design.
expertiseLevels
- intermediate
- expert
Outgoing edges
applies_to2
- specialization:ai-agents-conversational·Specialization
- domain:ml-ai·DomainML/AI
prerequisite_for_learning2
- skill-area:eval-driven-development·SkillAreaEval-Driven LLM Development
- skill-area:bias-fairness-analysis·SkillAreaBias and Fairness Analysis
Incoming edges
prerequisite_for_learning2
- skill-area:synthetic-data-generation·SkillAreaSynthetic Data Generation
- skill-area:AI-agent-evaluation·SkillAreaAI Agent Evaluation