Agentic AI Atlasby a5c.ai
OverviewWikiGraphFor AgentsEdgesSearchWorkspace
/
GitHubDocsDiscord
ivEdge detail
Agentic AI Atlas · scored_against
73 pairsa5c.ai
Search edge kinds/
Atlas · edge detail

Current ledger and paging

IV.Current edge kindpp. 1 - 1
IV.
Edge detail

scored_against

Page 1 of 1

scored_against ledger

an eval result is scored against a benchmark

Pairs · 73Cardinality · N:1
fromtoto kind
eval-result:mmlu.qwen-2-5-72b.001benchmark:mmluBenchmark
eval-result:human-eval.qwen-2-5-72b.001benchmark:human-evalBenchmark
eval-result:human-eval.qwen-2-5-coder-32b.001benchmark:human-evalBenchmark
eval-result:livecodebench.qwen-2-5-coder-32b.001benchmark:livecodebenchBenchmark
eval-result:mbpp.qwen-2-5-coder-32b.001benchmark:mbppBenchmark
eval-result:swe-bench-verified.claude-haiku-4-5.001benchmark:swe-bench-verifiedBenchmark
eval-result:gpqa.claude-haiku-4-5.001benchmark:mmluBenchmark
eval-result:human-eval.claude-sonnet-4-6.001benchmark:human-evalBenchmark
eval-result:mmlu.claude-sonnet-4-6.001benchmark:mmluBenchmark
eval-result:bfcl.claude-sonnet-4-5.001benchmark:berkeley-function-callingBenchmark
eval-result:gpqa-diamond.claude-opus-4-5.001benchmark:gpqaBenchmark
eval-result:os-world.claude-sonnet-4-5.001benchmark:os-worldBenchmark
eval-result:truthful-qa.claude-opus-4-5.001benchmark:truthful-qaBenchmark
eval-result:human-eval-plus.claude-sonnet-4-5.001benchmark:bigcode-evalplusBenchmark
eval-result:harmbench.claude-opus-4-5.001benchmark:harmbenchBenchmark
eval-result:arc-challenge.claude-sonnet-4-5.001benchmark:arc-challengeBenchmark
eval-result:mmlu.deepseek-v3.001benchmark:mmluBenchmark
eval-result:human-eval.deepseek-v3.001benchmark:human-evalBenchmark
eval-result:swe-bench.deepseek-v3.001benchmark:swe-bench-verifiedBenchmark
eval-result:mmlu.deepseek-r1.001benchmark:mmluBenchmark
eval-result:math.deepseek-r1.001benchmark:mathBenchmark
eval-result:gpqa.deepseek-r1.001benchmark:gpqaBenchmark
eval-result:gpqa.gemini-2-5-pro.001benchmark:mmluBenchmark
eval-result:livecodebench.gemini-2-5-pro.001benchmark:livecodebenchBenchmark
eval-result:swe-bench-verified.gemini-2-5-flash.001benchmark:swe-bench-verifiedBenchmark
eval-result:gpqa-diamond.gemini-2-5-pro.001benchmark:gpqaBenchmark
eval-result:android-world.gemini-2-5-pro.001benchmark:android-worldBenchmark
eval-result:mgsm.gemini-2-5-pro.001benchmark:mgsmBenchmark
eval-result:gpqa-diamond.gemini-3-1-pro.2026-02-19.accuracybenchmark:gpqaBenchmark
eval-result:gpqa-diamond.gemini-3-pro.2025-11-18.accuracybenchmark:gpqaBenchmark
eval-result:swe-bench-verified.llama-4-405b.001benchmark:swe-bench-verifiedBenchmark
eval-result:human-eval.llama-4-405b.001benchmark:human-evalBenchmark
eval-result:mmlu.llama-4-405b.001benchmark:mmluBenchmark
eval-result:swe-bench.llama-3-1-405b.001benchmark:swe-bench-verifiedBenchmark
eval-result:mmlu.llama-3-1-405b.001benchmark:mmluBenchmark
eval-result:human-eval.llama-3-1-405b.001benchmark:human-evalBenchmark
eval-result:mmlu.llama-3-3-70b.001benchmark:mmluBenchmark
eval-result:human-eval.llama-3-3-70b.001benchmark:human-evalBenchmark
eval-result:mmlu.mistral-large-2.001benchmark:mmluBenchmark
eval-result:human-eval.mistral-large-2.001benchmark:human-evalBenchmark
eval-result:human-eval.codestral-25-01.001benchmark:human-evalBenchmark
eval-result:multipl-e.codestral-25-01.001benchmark:multipl-eBenchmark
eval-result:gpqa.gpt-5.001benchmark:mmluBenchmark
eval-result:human-eval.gpt-5.001benchmark:human-evalBenchmark
eval-result:mmlu.o1.001benchmark:mmluBenchmark
eval-result:math.o3.001benchmark:mathBenchmark
eval-result:bfcl.gpt-5.001benchmark:berkeley-function-callingBenchmark
eval-result:gpqa-diamond.gpt-5.001benchmark:gpqaBenchmark
eval-result:human-eval-plus.gpt-5.001benchmark:bigcode-evalplusBenchmark
eval-result:gpqa-diamond.gpt-5-4.2026-03-17.accuracybenchmark:gpqaBenchmark
eval-result:gpqa-diamond.gpt-5-4-mini.2026-03-17.accuracybenchmark:gpqaBenchmark
eval-result:mmlu.phi-3-medium.001benchmark:mmluBenchmark
eval-result:mmlu.gemma-2-27b.001benchmark:mmluBenchmark
eval-result:gsm8k.gemma-2-27b.001benchmark:gsm8kBenchmark
eval-result:mmlu.command-r-plus.001benchmark:mmluBenchmark
eval-result:swe-bench-verified.claude-opus-4-5.001benchmark:swe-bench-verifiedBenchmark
eval-result:swe-bench-verified.claude-opus-4-7.001benchmark:swe-bench-verifiedBenchmark
eval-result:gpqa.claude-sonnet-4-5.001benchmark:mmluBenchmark
eval-result:swe-bench-verified.gpt-5.headlinebenchmark:swe-bench-verifiedBenchmark
eval-result:livecodebench.gpt-5.001benchmark:livecodebenchBenchmark
eval-result:swe-bench-verified.o3.001benchmark:swe-bench-verifiedBenchmark
eval-result:swe-bench-verified.gemini-2-5-pro.001benchmark:swe-bench-verifiedBenchmark
eval-result:gsm8k.claude-sonnet-4-5.001benchmark:gsm8kBenchmark
eval-result:hellaswag.claude-opus-4-5.001benchmark:hellaswagBenchmark
eval-result:math.gpt-5.001benchmark:mathBenchmark
eval-result:evalplus.gpt-5.001benchmark:bigcode-evalplusBenchmark
eval-result:terminal-bench.claude-sonnet-4-5.001benchmark:terminal-benchBenchmark
eval-result:gaia.claude-code.001benchmark:gaiaBenchmark
eval-result:swe-bench.claude-code.001benchmark:swe-bench-verifiedBenchmark
eval-result:swe-bench-verified.claude-sonnet-4-5.high-compute.001benchmark:swe-bench-verifiedBenchmark
eval-result:swe-bench-verified.claude-sonnet-4-5.001benchmark:swe-bench-verifiedBenchmark
eval-result:swe-bench-verified.gpt-5.headline.001benchmark:swe-bench-verifiedBenchmark
eval-result:swe-bench-verified.gpt-5.001benchmark:swe-bench-verifiedBenchmark

Definition

Source · EvalResult

Target · Benchmark

Cardinality · N:1

Navigate

Back to edge kinds
Open filtered graph