Agentic AI Atlas

II.

LibrarySkill overview

lib-skill:shared--eval-harness

Reference · live

eval-harness overview

Evaluation harness for testing agent and skill quality through structured benchmarks, regression tests, and quality scoring.

LibrarySkillOutgoing · 7Incoming · 0

Attributes

displayName

eval-harness

description

Evaluation harness for testing agent and skill quality through structured benchmarks, regression tests, and quality scoring.

libraryPath

library/methodologies/everything-claude-code/skills/eval-harness/SKILL.md

contentSummary

- Define test cases with known-correct outputs - Run agent against each test case - Score: accuracy, completeness, relevance - Compare against baseline performance - Track performance over time ### 2. Skill Quality Testing - Verify skill instructions produce expected outcomes - Test edge cases and

Outgoing edges

lib_applies_to_domain1

domain:software-engineering·DomainSoftware Engineering

lib_covers_topic1

topic:developer-experience·TopicDeveloper Experience (DX)

lib_implements_workflow1

workflow:feature-development·WorkflowFeature Development

lib_involves_role2

role:tech-lead·RoleTech Lead
role:backend-engineer·RoleBackend Engineer

lib_requires_skill_area2

skill-area:agentic-loops·SkillAreaAgentic Loops
skill-area:orchestration-loop·SkillAreaOrchestration Loop Engineering

Incoming edges

None.