| eval-result:human-eval-plus.claude-sonnet-4-5.001 | eval-result:human-eval-plus.claude-sonnet-4-5.001 | benchmarks |
| eval-result:human-eval-plus.gpt-5.001 | eval-result:human-eval-plus.gpt-5.001 | benchmarks |
| eval-result:human-eval.codestral-25-01.001 | eval-result:human-eval.codestral-25-01.001 | benchmarks |
| eval-result:human-eval.deepseek-v3.001 | eval-result:human-eval.deepseek-v3.001 | benchmarks |
| eval-result:human-eval.llama-3-1-405b.001 | eval-result:human-eval.llama-3-1-405b.001 | benchmarks |
| eval-result:human-eval.llama-3-3-70b.001 | eval-result:human-eval.llama-3-3-70b.001 | benchmarks |
| eval-result:human-eval.mistral-large-2.001 | eval-result:human-eval.mistral-large-2.001 | benchmarks |
| eval-result:human-eval.qwen-2-5-72b.001 | eval-result:human-eval.qwen-2-5-72b.001 | benchmarks |
| eval-result:human-eval.qwen-2-5-coder-32b.001 | eval-result:human-eval.qwen-2-5-coder-32b.001 | benchmarks |
| eval-result:livecodebench.qwen-2-5-coder-32b.001 | eval-result:livecodebench.qwen-2-5-coder-32b.001 | benchmarks |
| eval-result:math.deepseek-r1.001 | eval-result:math.deepseek-r1.001 | benchmarks |
| eval-result:mbpp.qwen-2-5-coder-32b.001 | eval-result:mbpp.qwen-2-5-coder-32b.001 | benchmarks |
| eval-result:multipl-e.codestral-25-01.001 | eval-result:multipl-e.codestral-25-01.001 | benchmarks |