II.
SkillArea overview
Reference · liveskill-area:ai-evaluation
AI Evaluation overview
Systematic evaluation of AI model outputs — benchmark design, human preference collection, automated scoring pipelines, and red-teaming for quality, safety, and alignment assessment.
Attributes
displayName
AI Evaluation
description
Systematic evaluation of AI model outputs — benchmark design, human
preference collection, automated scoring pipelines, and red-teaming
for quality, safety, and alignment assessment.
domains
expertiseLevels
- intermediate
- expert
Outgoing edges
applies_to1
- domain:ml-ops·DomainMLOps
Incoming edges
prerequisite_for_learning1
- skill-area:ai-agent-development·SkillAreaAI Agent Development
requires_skill_area1
- stack-profile:prompt-engineering-workbench·StackProfilePrompt Engineering Workbench (TypeScript, React, PostgreSQL, LLM APIs, Redis)
tool_used_by4
- tool:skillachi·ToolSkillachi
- tool:langsmith·ToolLangSmith
- tool:langfuse·ToolLangfuse
- tool:ragas·ToolRagas
used_for6
- tool:jupyter·ToolJupyter
- tool:vllm·ToolvLLM
- tool:tensorrt·ToolTensorRT
- tool:triton-inference·ToolTriton Inference Server
- tool:onnx-runtime·ToolONNX Runtime
- tool:ragas·ToolRagas