Agentic AI Atlasby a5c.ai
OverviewWikiGraphFor AgentsEdgesSearchWorkspace
/
GitHubDocsDiscord
i.3Wiki
Agentic AI Atlas · GAP-L1-P3-benchmarks-stale
process/gaps/GAP-L1-P3-benchmarks-stalea5c.ai
Search the atlas/
Wiki · linked records

Article and nearby pages

I.Current articlepp. 1 - 1
GAP-L1-P0-claude-code-plugin-component-typesGAP-L1-P0-claude-models-pricing-and-lineupGAP-L1-P0-mcp-spec-2025-11-25GAP-L1-P1-adaptive-thinking-vs-extended-thinkingGAP-L1-P1-anthropic-skills-vs-claude-code-skillsGAP-L1-P1-cursor-profiles-and-modes
I.
Wiki article

process/gaps/GAP-L1-P3-benchmarks-stale

Reading · 1 min

GAP-L1-P3-benchmarks-stale reference

schema/examples/benchmarks/ directory absent or sparse. Coverage-checklist OpenQuestion "Benchmark-run primitives at SDK layer" is unresolved. Major 2025/2026 benchmarks not represented:

Page nodewiki/process/gaps/GAP-L1-P3-benchmarks-stale.mdNearby pages · 21Documents · 0

Continue reading

Nearby pages in the same section.

GAP-L1-P0-claude-code-plugin-component-typesGAP-L1-P0-claude-models-pricing-and-lineupGAP-L1-P0-mcp-spec-2025-11-25GAP-L1-P1-adaptive-thinking-vs-extended-thinkingGAP-L1-P1-anthropic-skills-vs-claude-code-skillsGAP-L1-P1-cursor-profiles-and-modesGAP-L1-P1-mcp-elicitation-and-resource-linksGAP-L1-P1-mcp-oauth-resource-serverGAP-L1-P1-repo-graph-discovery-signalGAP-L1-P1-repo-graph-session-lifecycle-semanticsGAP-L1-P2-gemini-2-5-and-3GAP-L1-P2-mcp-stdio-vs-http-sse-deprecationGAP-L1-P2-openai-codex-and-responses-apiGAP-L1-P2-repo-graph-cisurface-packagesurfaceGAP-L1-P2-repo-graph-pluginartifactGAP-L2-P0-pathdescriptor-undeclared-but-referencedGAP-L2-P1-edge-kinds-md-vs-yaml-parityGAP-L2-P1-mcptransport-status-attribute-undeclaredGAP-L2-P2-cluster-count-mismatchGAP-L2-P2-coverage-checklist-internal-broken-refsGAP-L2-P2-versionrange-attribute-on-modelversion

GAP-L1-P3-benchmarks-stale

FieldValue
idgap:benchmarks-stale
titleBenchmark NodeKind has no current SWE-bench Verified, Aider Polyglot, ARC-AGI 2 examples
level1
priorityP3
discoveredAt2026-04-28T00:00:00Z
sourceschema/examples/benchmarks/
statusopen
ownertbd

Current state

schema/examples/benchmarks/ directory absent or sparse. Coverage-checklist OpenQuestion "Benchmark-run primitives at SDK layer" is unresolved. Major 2025/2026 benchmarks not represented:

  • SWE-bench Verified (current de-facto coding agent benchmark)
  • Aider Polyglot
  • ARC-AGI 2
  • Terminal-Bench
  • HumanEval/MBPP (older, but still cited)

Desired state

Add 5 Benchmark example files; add EvalRun examples for at least Claude Opus 4.7 and gpt-5-codex on SWE-bench Verified to demonstrate the eval graph.

Evidence

  • swebench.com
  • arcprize.org
  • aider.chat/docs/leaderboards/

Propagation status

  • Level 1: open
  • Level 2: not-started

Propagation chain

  • Level 1: 5 example files + 2 EvalRun example files.

Notes

P3 — important for usefulness but not for schema correctness.

Trail

Wiki
Process and Governance
Gap Tracker

GAP-L1-P3-benchmarks-stale

Continue reading

GAP-L1-P0-claude-code-plugin-component-types
GAP-L1-P0-claude-models-pricing-and-lineup
GAP-L1-P0-mcp-spec-2025-11-25
GAP-L1-P1-adaptive-thinking-vs-extended-thinking
GAP-L1-P1-anthropic-skills-vs-claude-code-skills
GAP-L1-P1-cursor-profiles-and-modes
GAP-L1-P1-mcp-elicitation-and-resource-links
GAP-L1-P1-mcp-oauth-resource-server

Page record

Open node ledger

wiki/process/gaps/GAP-L1-P3-benchmarks-stale.md

Documents

No documented graph nodes on this page.