subjectId
inScope
Self-contained Python function-completion tasks scored by
pass-at-k against unit tests. 164 hand-crafted problems with
docstring + signature input.
outOfScope
Multi-file projects, languages other than Python (use MultiPL-E /
HumanEvalX for cross-language), repository-scale tasks (use SWE-bench),
multi-turn / agentic tasks, and natural-language reasoning benchmarks.
outOfScopeReasonIds