| eval-run:gpqa-diamond.claude-opus-4-5.2025-09 | eval-run:gpqa-diamond.claude-opus-4-5.2025-09 | benchmarks |
| eval-run:gpqa-diamond.gemini-2-5-pro.2025-06 | eval-run:gpqa-diamond.gemini-2-5-pro.2025-06 | benchmarks |
| eval-run:gpqa-diamond.gemini-3-1-pro.2026-02-19 | eval-run:gpqa-diamond.gemini-3-1-pro.2026-02-19 | benchmarks |
| eval-run:gpqa-diamond.gemini-3-pro.2025-11-18 | eval-run:gpqa-diamond.gemini-3-pro.2025-11-18 | benchmarks |
| eval-run:gpqa-diamond.gpt-5-4-mini.2026-03-17 | eval-run:gpqa-diamond.gpt-5-4-mini.2026-03-17 | benchmarks |
| eval-run:gpqa-diamond.gpt-5-4.2026-03-17 | eval-run:gpqa-diamond.gpt-5-4.2026-03-17 | benchmarks |
| eval-run:gpqa-diamond.gpt-5.2025-08 | eval-run:gpqa-diamond.gpt-5.2025-08 | benchmarks |
| eval-run:gpqa.deepseek-r1.2025-01 | eval-run:gpqa.deepseek-r1.2025-01 | benchmarks |