subjectId
inScope
Hendrycks MATH dataset — 12.5k competition-level mathematics problems across algebra, geometry, number theory, calculus, etc., scored by exact-match on a normalized final answer.
outOfScope
Grade-school arithmetic (use GSM8K), proof-style mathematical reasoning, and code-generation tasks.
outOfScopeReasonIds