| id | displayName | cluster |
|---|---|---|
| benchmark:agentbench | AgentBench | benchmarks |
| benchmark:agentboard | AgentBoard | benchmarks |
| benchmark:agentclinic | AgentClinic | benchmarks |
| benchmark:aider-polyglot | Aider Polyglot | benchmarks |
| benchmark:android-world | AndroidWorld | benchmarks |
| benchmark:appworld | AppWorld | benchmarks |
| benchmark:arc-agi-3 | ARC-AGI 3 | benchmarks |
| benchmark:assistant-bench | AssistantBench | benchmarks |
| benchmark:browse-comp | BrowseComp | benchmarks |
| benchmark:cyber-bench | CyberBench | benchmarks |
| benchmark:gaia | GAIA | benchmarks |
| benchmark:mind2web-2 | Mind2Web 2 | benchmarks |
| benchmark:mle-bench | MLE-bench | benchmarks |
| benchmark:os-world | OSWorld | benchmarks |
| benchmark:re-bench | RE-Bench | benchmarks |
| benchmark:swe-bench | SWE-bench | benchmarks |
| benchmark:swe-bench-multimodal | SWE-bench Multimodal | benchmarks |
| benchmark:swe-bench-verified | SWE-bench Verified | benchmarks |
| benchmark:swe-lancer | SWE-Lancer | benchmarks |
| benchmark:tau-bench | tau-bench | benchmarks |
| benchmark:terminal-bench | Terminal-Bench | benchmarks |
| benchmark:the-agent-company | TheAgentCompany | benchmarks |
| benchmark:toolbench | ToolBench | benchmarks |
| benchmark:travelplanner | TravelPlanner | benchmarks |
| benchmark:visualwebarena | VisualWebArena | benchmarks |
| benchmark:webarena | WebArena | benchmarks |
| benchmark:webvoyager | WebVoyager | benchmarks |
| benchmark:workarena | WorkArena | benchmarks |