subjectId
inScope
Multi-language code-generation benchmark — translates HumanEval/MBPP problems into 18+ programming languages and scores by per-language pass-rate.
outOfScope
Repository-scale tasks, agentic tool-use, and contamination-resistant rolling collection (use LiveCodeBench).
outOfScopeReasonIds