III.
Node kind ledger
Page 2 of 2EvalRun
EvalRun records
Browse all EvalRun records in the current atlas snapshot.
Filters & facets7 groups
configHash
sha256:placeholder-gpt-5-evalplus · 2sha256:placeholder-qwen-2-5-72b-mmlu · 1sha256:placeholder-qwen-2-5-72b-humaneval · 1sha256:placeholder-qwen-2-5-coder-32b-humaneval · 1sha256:placeholder-qwen-2-5-coder-32b-lcb · 1sha256:placeholder-qwen-2-5-coder-32b-mbpp · 1sha256:placeholder-claude-haiku-4-5-swe-bench-verified · 1sha256:placeholder-claude-haiku-4-5-gpqa · 1sha256:placeholder-claude-sonnet-4-6-human-eval · 1sha256:placeholder-claude-sonnet-4-6-mmlu · 1sha256:placeholder-claude-sonnet-4-5-bfcl-v3 · 1sha256:placeholder-claude-opus-4-5-gpqa-diamond · 1
target
model:gpt-5@current · 9model:claude-sonnet-4-5@current · 8model:gemini-2-5-pro@current · 6model:claude-opus-4-5@current · 5model:qwen-2-5-coder-32b@current · 3model:deepseek-v3@current · 3model:deepseek-r1@current · 3model:llama-4-405b-instruct@current · 3model:llama-3-1-405b-instruct@current · 3model:qwen-2-5-72b-instruct@current · 2model:claude-haiku-4-5@current · 2model:claude-sonnet-4-6@current · 2
targetId
model:gpt-5@current · 9model:claude-sonnet-4-5@current · 8model:gemini-2-5-pro@current · 6model:claude-opus-4-5@current · 5model:qwen-2-5-coder-32b@current · 3model:deepseek-v3@current · 3model:deepseek-r1@current · 3model:llama-4-405b-instruct@current · 3model:llama-3-1-405b-instruct@current · 3model:qwen-2-5-72b-instruct@current · 2model:claude-haiku-4-5@current · 2model:claude-sonnet-4-6@current · 2
runAt
benchmarkId
runBy
| id | displayName | cluster |
|---|---|---|
| eval-run:mmlu.mistral-large-2.2024-07 | eval-run:mmlu.mistral-large-2.2024-07 | benchmarks |
| eval-run:mmlu.o1.2024-12 | eval-run:mmlu.o1.2024-12 | benchmarks |
| eval-run:mmlu.phi-3-medium.2024-05 | eval-run:mmlu.phi-3-medium.2024-05 | benchmarks |
| eval-run:mmlu.qwen-2-5-72b.2024-09 | eval-run:mmlu.qwen-2-5-72b.2024-09 | benchmarks |
| eval-run:multipl-e.codestral-25-01.2025-01 | eval-run:multipl-e.codestral-25-01.2025-01 | benchmarks |
| eval-run:os-world.claude-sonnet-4-5.2025-09 | eval-run:os-world.claude-sonnet-4-5.2025-09 | benchmarks |
| eval-run:swe-bench-verified.claude-haiku-4-5.2025-10 | eval-run:swe-bench-verified.claude-haiku-4-5.2025-10 | benchmarks |
| eval-run:swe-bench-verified.claude-opus-4-5.2025-09 | eval-run:swe-bench-verified.claude-opus-4-5.2025-09 | benchmarks |
| eval-run:swe-bench-verified.claude-opus-4-7.2026-01 | eval-run:swe-bench-verified.claude-opus-4-7.2026-01 | benchmarks |
| eval-run:swe-bench-verified.claude-sonnet-4-5.2025-09 | eval-run:swe-bench-verified.claude-sonnet-4-5.2025-09 | benchmarks |
| eval-run:swe-bench-verified.gemini-2-5-flash.2025-06 | eval-run:swe-bench-verified.gemini-2-5-flash.2025-06 | benchmarks |
| eval-run:swe-bench-verified.gemini-2-5-pro.2025-06 | eval-run:swe-bench-verified.gemini-2-5-pro.2025-06 | benchmarks |
| eval-run:swe-bench-verified.gpt-5.2025-08 | eval-run:swe-bench-verified.gpt-5.2025-08 | benchmarks |
| eval-run:swe-bench-verified.llama-4-405b.2024-07 | eval-run:swe-bench-verified.llama-4-405b.2024-07 | benchmarks |
| eval-run:swe-bench-verified.o3.2025-04 | eval-run:swe-bench-verified.o3.2025-04 | benchmarks |
| eval-run:swe-bench.claude-code@1.x.2025-04-29 | eval-run:swe-bench.claude-code@1.x.2025-04-29 | benchmarks |
| eval-run:swe-bench.deepseek-v3.2024-12 | eval-run:swe-bench.deepseek-v3.2024-12 | benchmarks |
| eval-run:swe-bench.llama-3-1-405b.2024-07 | eval-run:swe-bench.llama-3-1-405b.2024-07 | benchmarks |
| eval-run:terminal-bench.claude-sonnet-4-5.2025-09 | eval-run:terminal-bench.claude-sonnet-4-5.2025-09 | benchmarks |
| eval-run:truthful-qa.claude-opus-4-5.2025-09 | eval-run:truthful-qa.claude-opus-4-5.2025-09 | benchmarks |