II.
Benchmark graph
Neighborhood · livebenchmark:mmlu
MMLU graph
View the immediate incoming and outgoing neighborhood around this record without leaving the record detail surface.
Outgoing · 1Incoming · 29
benchmark:mmlu
Benchmark
skill-area:general-knowled…
SkillArea
eval-result:mmlu.qwen-2-5-…
EvalResult
eval-result:gpqa.claude-ha…
EvalResult
eval-result:mmlu.claude-so…
EvalResult
eval-result:mmlu.deepseek-…
EvalResult
eval-result:mmlu.deepseek-…
EvalResult
eval-result:gpqa.gemini-2-…
EvalResult
eval-result:mmlu.llama-4-4…
EvalResult
eval-result:mmlu.llama-3-1…
EvalResult
eval-result:mmlu.llama-3-3…
EvalResult
eval-result:mmlu.mistral-l…
EvalResult
eval-result:gpqa.gpt-5.001
EvalResult
eval-result:mmlu.o1.001
EvalResult
eval-result:mmlu.phi-3-med…
EvalResult
eval-result:mmlu.gemma-2-2…
EvalResult
eval-result:mmlu.command-r…
EvalResult
eval-result:gpqa.claude-so…
EvalResult
eval-run:mmlu.qwen-2-5-72b…
EvalRun
eval-run:mmlu.claude-sonne…
EvalRun
eval-run:mmlu.deepseek-v3.…
EvalRun
eval-run:mmlu.deepseek-r1.…
EvalRun
eval-run:mmlu.llama-4-405b…
EvalRun
eval-run:mmlu.llama-3-1-40…
EvalRun
eval-run:mmlu.llama-3-3-70…
EvalRun
eval-run:mmlu.mistral-larg…
EvalRun
eval-run:mmlu.o1.2024-12
EvalRun
eval-run:mmlu.phi-3-medium…
EvalRun
eval-run:mmlu.gemma-2-27b.…
EvalRun
eval-run:mmlu.command-r-pl…
EvalRun
scope-boundary:mmlu.scope
ScopeBoundary
Press enter or space to select a node. You can then use the arrow keys to move the node around. Press delete to remove it and escape to cancel.
Press enter or space to select an edge. You can then press delete to remove it or escape to cancel.