Agentic AI Atlasby a5c.ai
OverviewWikiGraphFor AgentsEdgesSearchWorkspace
/
GitHubDocsDiscord
iiiNode kind
Agentic AI Atlas · EvalRun
0 recordsa5c.ai
Search kind facets/
Atlas · node kind

Current kind and facets

III.EvalRunpp. 1 - 1
configHash: sha256:placeholder-gpt-5-evalplusconfigHash: sha256:placeholder-qwen-2-5-72b-mmlutarget: model:gpt-5@currenttarget: model:claude-sonnet-4-5@currenttargetId: model:gpt-5@currenttargetId: model:claude-sonnet-4-5@currentrunAt: 2025-09-29T00:00:00ZrunAt: 2025-08-07T00:00:00Z
III.
Node kind ledger

EvalRun

Page 1 of 1

EvalRun records

Browse all EvalRun records in the current atlas snapshot.

Cluster · benchmarksTotal · 70Visible · 0
benchmarkId: benchmark:truthful-qa xrunAt: 2025-01-20T00:00:00Z xtestSetId: test-set:bfcl-v3 xclear all
Filters & facets3 active · 7 groups

configHash

sha256:placeholder-gpt-5-evalplus · 2sha256:placeholder-qwen-2-5-72b-mmlu · 1sha256:placeholder-qwen-2-5-72b-humaneval · 1sha256:placeholder-qwen-2-5-coder-32b-humaneval · 1sha256:placeholder-qwen-2-5-coder-32b-lcb · 1sha256:placeholder-qwen-2-5-coder-32b-mbpp · 1sha256:placeholder-claude-haiku-4-5-swe-bench-verified · 1sha256:placeholder-claude-haiku-4-5-gpqa · 1sha256:placeholder-claude-sonnet-4-6-human-eval · 1sha256:placeholder-claude-sonnet-4-6-mmlu · 1sha256:placeholder-claude-sonnet-4-5-bfcl-v3 · 1sha256:placeholder-claude-opus-4-5-gpqa-diamond · 1

target

model:gpt-5@current · 9model:claude-sonnet-4-5@current · 8model:gemini-2-5-pro@current · 6model:claude-opus-4-5@current · 5model:qwen-2-5-coder-32b@current · 3model:deepseek-v3@current · 3model:deepseek-r1@current · 3model:llama-4-405b-instruct@current · 3model:llama-3-1-405b-instruct@current · 3model:qwen-2-5-72b-instruct@current · 2model:claude-haiku-4-5@current · 2model:claude-sonnet-4-6@current · 2

targetId

model:gpt-5@current · 9model:claude-sonnet-4-5@current · 8model:gemini-2-5-pro@current · 6model:claude-opus-4-5@current · 5model:qwen-2-5-coder-32b@current · 3model:deepseek-v3@current · 3model:deepseek-r1@current · 3model:llama-4-405b-instruct@current · 3model:llama-3-1-405b-instruct@current · 3model:qwen-2-5-72b-instruct@current · 2model:claude-haiku-4-5@current · 2model:claude-sonnet-4-6@current · 2

runAt

2025-09-29T00:00:00Z · 132025-08-07T00:00:00Z · 92025-06-17T00:00:00Z · 72024-07-23T00:00:00Z · 62024-11-12T00:00:00Z · 32024-12-26T00:00:00Z · 32025-01-20T00:00:00Z · 32024-09-19T00:00:00Z · 22025-10-15T00:00:00Z · 22025-11-15T00:00:00Z · 22024-12-06T00:00:00Z · 22024-07-24T00:00:00Z · 2

benchmarkId

benchmark:mmlu · 12benchmark:swe-bench-verified · 12benchmark:gpqa · 12benchmark:human-eval · 10benchmark:livecodebench · 3benchmark:bigcode-evalplus · 3benchmark:math · 3benchmark:berkeley-function-calling · 2benchmark:gsm8k · 2benchmark:mbpp · 1benchmark:os-world · 1benchmark:truthful-qa · 1

runBy

anthropic · 16openai · 11google-deepmind · 9meta · 7deepseek · 6qwen-team · 5mistral · 4evalplus-leaderboard · 3berkeley-gorilla · 2google · 2@a5c-ai/team · 2artificial-analysis · 1

testSetId

test-set:swe-bench-verified-2024-12 · 26test-set:gpqa-diamond-2024 · 12test-set:bfcl-v3 · 2test-set:truthful-qa-mc · 1test-set:gaia-validation · 1
id-ascid-descname-ascname-desc
iddisplayNamecluster
No records match the current filters.

Active filters

benchmarkId: benchmark:truthful-qa
runAt: 2025-01-20T00:00:00Z
testSetId: test-set:bfcl-v3

Sort

id-asc
id-desc
name-asc
name-desc