subjectId
inScope
Cybersecurity capability evaluations — CTF-style tasks (CTF-Bench / CyberSecEval) testing offensive and defensive reasoning. Scored by per-task success.
outOfScope
General software-engineering tasks, code-generation quality, and non-security agentic workflows.
outOfScopeReasonIds