subjectId
inScope
Practical-programming benchmark exercising library-use across 1,140 tasks; scored by pass-rate against curated unit tests.
outOfScope
Repository-scale tasks, agentic tool-use, contamination-resistant rolling collection (use LiveCodeBench), and non-Python languages.
outOfScopeReasonIds