| eval-run:human-eval.claude-sonnet-4-6.2025-11 | eval-run:human-eval.claude-sonnet-4-6.2025-11 | benchmarks |
| eval-run:human-eval.codestral-25-01.2025-01 | eval-run:human-eval.codestral-25-01.2025-01 | benchmarks |
| eval-run:human-eval.deepseek-v3.2024-12 | eval-run:human-eval.deepseek-v3.2024-12 | benchmarks |
| eval-run:human-eval.gpt-5.2025-08 | eval-run:human-eval.gpt-5.2025-08 | benchmarks |
| eval-run:human-eval.llama-3-1-405b.2024-07 | eval-run:human-eval.llama-3-1-405b.2024-07 | benchmarks |
| eval-run:human-eval.llama-3-3-70b.2024-12 | eval-run:human-eval.llama-3-3-70b.2024-12 | benchmarks |
| eval-run:human-eval.llama-4-405b.2024-07 | eval-run:human-eval.llama-4-405b.2024-07 | benchmarks |
| eval-run:human-eval.mistral-large-2.2024-07 | eval-run:human-eval.mistral-large-2.2024-07 | benchmarks |
| eval-run:human-eval.qwen-2-5-72b.2024-09 | eval-run:human-eval.qwen-2-5-72b.2024-09 | benchmarks |
| eval-run:human-eval.qwen-2-5-coder-32b.2024-11 | eval-run:human-eval.qwen-2-5-coder-32b.2024-11 | benchmarks |