Agentic AI Atlasby a5c.ai
OverviewWikiGraphFor AgentsEdgesSearchWorkspace
/
GitHubDocsDiscord
iiiNode kind
Agentic AI Atlas · Benchmark
0 recordsa5c.ai
Search kind facets/
Atlas · node kind

Current kind and facets

III.Benchmarkpp. 1 - 1
homepageUrl: https://www.swebench.com/homepageUrl: https://github.com/THUDM/AgentBenchkind: model-onlykind: code-generationdescription: General AI Assistants benchmark — real-world agent reasoning tasks. description: Hand-written programming problems for evaluating code generation. targetsKind: ModelVersiontargetsKind: AgentVersion
III.
Node kind ledger

Benchmark

Page 1 of 1

Benchmark records

Browse all Benchmark records in the current atlas snapshot.

Cluster · benchmarksTotal · 65Visible · 0
description: Hand-written programming problems for evaluating code generation. xkind: knowledge xclear all
Filters & facets2 active · 4 groups

homepageUrl

https://www.swebench.com/ · 2https://github.com/THUDM/AgentBench · 1https://github.com/hendrycks/apps · 1https://os-world.github.io/ · 1https://google-research.github.io/android_world/ · 1https://metr.org/AI_R_D_Evaluation_Report.pdf · 1https://appworld.dev/ · 1https://assistantbench.github.io/ · 1https://the-agent-company.com/ · 1https://agentclinic.github.io/ · 1https://osu-nlp-group.github.io/TravelPlanner/ · 1https://openai.com/index/browsecomp/ · 1

kind

model-only · 14code-generation · 7full-stack · 7web-agent · 7reasoning · 5math · 4tool-use · 3domain-specific · 2agent-leaderboard · 2knowledge · 2research-engineering · 1planning · 1

description

General AI Assistants benchmark — real-world agent reasoning tasks. · 1Hand-written programming problems for evaluating code generation. · 1MBPP+ from EvalPlus — augmented MBPP with substantially expanded test suites. · 1Machine learning engineering tasks drawn from Kaggle competitions. · 1Massive Multitask Language Understanding — 57-subject knowledge benchmark. · 1Real-world software engineering issues from open-source Python repos. · 1

targetsKind

ModelVersion · 37AgentVersion · 28
id-ascid-descname-ascname-desc
iddisplayNamecluster
No records match the current filters.

Active filters

description: Hand-written programming problems for evaluating code generation.
kind: knowledge

Sort

id-asc
id-desc
name-asc
name-desc