Atlas Graph Explorer
Wiki
Graph
Edges
Home
Benchmark
benchmark:mmlu
MMLU
benchmark:mmlu
Benchmark
benchmarks/benchmarks/mmlu.yaml
·
Open in Graph →
overview
json
graph
Attributes
displayName
MMLU
homepageUrl
https://github.com/hendrycks/test
kind
knowledge
targetsKind
ModelVersion
description
Massive Multitask Language Understanding — 57-subject knowledge benchmark.
Outgoing edges
(1)
covers
1
skill-area:general-knowledge-reasoning
·
SkillArea
General Knowledge Reasoning
Incoming edges
(18)
bounds_subject
1
scope-boundary:mmlu.scope
·
ScopeBoundary
for_benchmark
16
eval-run:mmlu.qwen-2-5-72b.2024-09
·
EvalRun
eval-run:gpqa.claude-haiku-4-5.2025-10
·
EvalRun
eval-run:mmlu.claude-sonnet-4-6.2025-11
·
EvalRun
eval-run:mmlu.deepseek-v3.2024-12
·
EvalRun
eval-run:mmlu.deepseek-r1.2025-01
·
EvalRun
eval-run:gpqa.gemini-2-5-pro.2025-06
·
EvalRun
eval-run:mmlu.llama-4-405b.2024-07
·
EvalRun
eval-run:mmlu.llama-3-1-405b.2024-07
·
EvalRun
eval-run:mmlu.llama-3-3-70b.2024-12
·
EvalRun
eval-run:mmlu.mistral-large-2.2024-07
·
EvalRun
eval-run:gpqa.gpt-5.2025-08
·
EvalRun
eval-run:mmlu.o1.2024-12
·
EvalRun
eval-run:mmlu.phi-3-medium.2024-05
·
EvalRun
eval-run:mmlu.gemma-2-27b.2024-06
·
EvalRun
eval-run:mmlu.command-r-plus.2024-08
·
EvalRun
eval-run:gpqa.claude-sonnet-4-5.2025-09
·
EvalRun
scored_against
1
eval-result:mmlu.qwen-2-5-72b.001
·
EvalResult