| eval-result:arc-challenge.claude-sonnet-4-5.001 | eval-result:arc-challenge.claude-sonnet-4-5.001 | benchmarks |
| eval-result:bfcl.claude-sonnet-4-5.001 | eval-result:bfcl.claude-sonnet-4-5.001 | benchmarks |
| eval-result:gpqa-diamond.claude-opus-4-5.001 | eval-result:gpqa-diamond.claude-opus-4-5.001 | benchmarks |
| eval-result:gpqa.claude-sonnet-4-5.001 | eval-result:gpqa.claude-sonnet-4-5.001 | benchmarks |
| eval-result:gsm8k.claude-sonnet-4-5.001 | eval-result:gsm8k.claude-sonnet-4-5.001 | benchmarks |
| eval-result:harmbench.claude-opus-4-5.001 | eval-result:harmbench.claude-opus-4-5.001 | benchmarks |
| eval-result:hellaswag.claude-opus-4-5.001 | eval-result:hellaswag.claude-opus-4-5.001 | benchmarks |
| eval-result:human-eval-plus.claude-sonnet-4-5.001 | eval-result:human-eval-plus.claude-sonnet-4-5.001 | benchmarks |
| eval-result:os-world.claude-sonnet-4-5.001 | eval-result:os-world.claude-sonnet-4-5.001 | benchmarks |
| eval-result:swe-bench-verified.claude-opus-4-5.001 | eval-result:swe-bench-verified.claude-opus-4-5.001 | benchmarks |
| eval-result:swe-bench-verified.claude-sonnet-4-5.001 | eval-result:swe-bench-verified.claude-sonnet-4-5.001 | benchmarks |
| eval-result:swe-bench-verified.claude-sonnet-4-5.high-compute.001 | eval-result:swe-bench-verified.claude-sonnet-4-5.high-compute.001 | benchmarks |
| eval-result:terminal-bench.claude-sonnet-4-5.001 | eval-result:terminal-bench.claude-sonnet-4-5.001 | benchmarks |
| eval-result:truthful-qa.claude-opus-4-5.001 | eval-result:truthful-qa.claude-opus-4-5.001 | benchmarks |