II.
LibrarySkill overview
Reference · livelib-skill:shared--eval-harness
eval-harness overview
Evaluation harness for testing agent and skill quality through structured benchmarks, regression tests, and quality scoring.
Attributes
displayName
eval-harness
description
Evaluation harness for testing agent and skill quality through structured benchmarks, regression tests, and quality scoring.
libraryPath
library/methodologies/everything-claude-code/skills/eval-harness/SKILL.md
contentSummary
- Define test cases with known-correct outputs
- Run agent against each test case
- Score: accuracy, completeness, relevance
- Compare against baseline performance
- Track performance over time
### 2. Skill Quality Testing
- Verify skill instructions produce expected outcomes
- Test edge ca
Outgoing edges
lib_applies_to_domain1
- domain:software-engineering·DomainSoftware Engineering
lib_covers_topic1
- topic:developer-experience·TopicDeveloper Experience (DX)
lib_implements_workflow1
- workflow:feature-development·Workflow
lib_involves_role2
- role:tech-lead·RoleTech Lead
- role:backend-engineer·RoleBackend Engineer
lib_requires_skill_area2
- skill-area:agentic-loops·SkillAreaAgentic Loops
- skill-area:orchestration-loop·SkillAreaOrchestration Loop Engineering
Incoming edges
None.