Atlas Graph Explorer
Wiki
Graph
Edges
Home
EdgeKinds
uses_test_set
uses_test_set
an EvalRun uses a TestSet
26 wired pairs · cardinality N:N
from
to
to kind
benchmark:flores-200
test-set:flores-200-devtest
TestSet
benchmark:gpqa
test-set:gpqa-diamond
TestSet
eval-run:swe-bench-verified.claude-haiku-4-5.2025-10
test-set:swe-bench-verified-2024-12
TestSet
eval-run:bfcl.claude-sonnet-4-5.2025-09
test-set:bfcl-v3
TestSet
eval-run:gpqa-diamond.claude-opus-4-5.2025-09
test-set:gpqa-diamond-2024
TestSet
eval-run:truthful-qa.claude-opus-4-5.2025-09
test-set:truthful-qa-mc
TestSet
eval-run:swe-bench.deepseek-v3.2024-12
test-set:swe-bench-verified-2024-12
TestSet
eval-run:gpqa.deepseek-r1.2025-01
test-set:gpqa-diamond-2024
TestSet
eval-run:swe-bench-verified.gemini-2-5-flash.2025-06
test-set:swe-bench-verified-2024-12
TestSet
eval-run:gpqa-diamond.gemini-2-5-pro.2025-06
test-set:gpqa-diamond-2024
TestSet
eval-run:gpqa-diamond.gemini-3-1-pro.2026-02-19
test-set:gpqa-diamond-2024
TestSet
eval-run:gpqa-diamond.gemini-3-pro.2025-11-18
test-set:gpqa-diamond-2024
TestSet
eval-run:swe-bench-verified.llama-4-405b.2024-07
test-set:swe-bench-verified-2024-12
TestSet
eval-run:swe-bench.llama-3-1-405b.2024-07
test-set:swe-bench-verified-2024-12
TestSet
eval-run:bfcl.gpt-5.2025-08
test-set:bfcl-v3
TestSet
eval-run:gpqa-diamond.gpt-5.2025-08
test-set:gpqa-diamond-2024
TestSet
eval-run:gpqa-diamond.gpt-5-4.2026-03-17
test-set:gpqa-diamond-2024
TestSet
eval-run:gpqa-diamond.gpt-5-4-mini.2026-03-17
test-set:gpqa-diamond-2024
TestSet
eval-run:swe-bench-verified.claude-opus-4-5.2025-09
test-set:swe-bench-verified-2024-12
TestSet
eval-run:swe-bench-verified.claude-opus-4-7.2026-01
test-set:swe-bench-verified-2024-12
TestSet
eval-run:swe-bench-verified.o3.2025-04
test-set:swe-bench-verified-2024-12
TestSet
eval-run:swe-bench-verified.gemini-2-5-pro.2025-06
test-set:swe-bench-verified-2024-12
TestSet
eval-run:gaia.claude-code.2025
test-set:gaia-validation
TestSet
eval-run:swe-bench.claude-code@1.x.2025-04-29
test-set:swe-bench-verified-2024-12
TestSet
eval-run:swe-bench-verified.claude-sonnet-4-5.2025-09
test-set:swe-bench-verified-2024-12
TestSet
eval-run:swe-bench-verified.gpt-5.2025-08
test-set:swe-bench-verified-2024-12
TestSet