subjectId
inScope
EvalPlus harness — augments HumanEval and MBPP test suites with thousands of additional generated tests to catch overfitting on the original test sets.
outOfScope
Repository-scale tasks, multi-file refactoring, agentic tool-use, and benchmarks beyond HumanEval/MBPP coverage.
outOfScopeReasonIds