| eval-run:arc-challenge.claude-sonnet-4-5.2025-09 | eval-run:arc-challenge.claude-sonnet-4-5.2025-09 | benchmarks |
| eval-run:bfcl.claude-sonnet-4-5.2025-09 | eval-run:bfcl.claude-sonnet-4-5.2025-09 | benchmarks |
| eval-run:gpqa.claude-sonnet-4-5.2025-09 | eval-run:gpqa.claude-sonnet-4-5.2025-09 | benchmarks |
| eval-run:gsm8k.claude-sonnet-4-5.2025-09 | eval-run:gsm8k.claude-sonnet-4-5.2025-09 | benchmarks |
| eval-run:human-eval-plus.claude-sonnet-4-5.2025-09 | eval-run:human-eval-plus.claude-sonnet-4-5.2025-09 | benchmarks |
| eval-run:os-world.claude-sonnet-4-5.2025-09 | eval-run:os-world.claude-sonnet-4-5.2025-09 | benchmarks |
| eval-run:swe-bench-verified.claude-sonnet-4-5.2025-09 | eval-run:swe-bench-verified.claude-sonnet-4-5.2025-09 | benchmarks |
| eval-run:terminal-bench.claude-sonnet-4-5.2025-09 | eval-run:terminal-bench.claude-sonnet-4-5.2025-09 | benchmarks |