Agentic AI Atlas

II.

SkillArea overview

skill-area:ai-evaluation

Reference · live

AI Evaluation overview

Systematic evaluation of AI model outputs — benchmark design, human preference collection, automated scoring pipelines, and red-teaming for quality, safety, and alignment assessment.

SkillAreaOutgoing · 1Incoming · 12

Attributes

displayName

AI Evaluation

description

Systematic evaluation of AI model outputs — benchmark design, human preference collection, automated scoring pipelines, and red-teaming for quality, safety, and alignment assessment.

domains

domain:ml-ops

expertiseLevels

intermediate
expert

Outgoing edges

applies_to1

domain:ml-ops·DomainMLOps

Incoming edges

prerequisite_for_learning1

skill-area:ai-agent-development·SkillAreaAI Agent Development

requires_skill_area1

stack-profile:prompt-engineering-workbench·StackProfilePrompt Engineering Workbench (TypeScript, React, PostgreSQL, LLM APIs, Redis)

tool_used_by4

tool:skillachi·ToolSkillachi
tool:langsmith·ToolLangSmith
tool:langfuse·ToolLangfuse
tool:ragas·ToolRagas

used_for6

tool:jupyter·ToolJupyter
tool:vllm·ToolvLLM
tool:tensorrt·ToolTensorRT
tool:triton-inference·ToolTriton Inference Server
tool:onnx-runtime·ToolONNX Runtime
tool:ragas·ToolRagas