subjectId
inScope
Multiple-choice (4-option) academic-knowledge questions across 57
subjects (STEM, humanities, social sciences, professional) — scored
by accuracy on a held-out test set of ~14k questions. English only.
outOfScope
Code-generation tasks, agentic / tool-use evaluations, multilingual
reasoning (use MMMLU/Global-MMLU), free-form generation quality,
and tasks requiring image/video input.
outOfScopeReasonIds