Agentic AI Atlas

II.

Page overview

page:process-gaps-GAP-L1-P3-benchmarks-stale

Reference · live

GAP-L1-P3-benchmarks-stale overview

Inspect the raw attributes, linked wiki pages, and inbound or outbound graph edges for page:process-gaps-GAP-L1-P3-benchmarks-stale.

PageOutgoing · 0Incoming · 1

Attributes

nodeKind

Page

title

GAP-L1-P3-benchmarks-stale

displayName

GAP-L1-P3-benchmarks-stale

slug

process/gaps/GAP-L1-P3-benchmarks-stale

articlePath

wiki/process/gaps/GAP-L1-P3-benchmarks-stale.md

article

# GAP-L1-P3-benchmarks-stale | Field | Value | |---|---| | id | gap:benchmarks-stale | | title | Benchmark NodeKind has no current SWE-bench Verified, Aider Polyglot, ARC-AGI 2 examples | | level | 1 | | priority | P3 | | discoveredAt | 2026-04-28T00:00:00Z | | source | schema/examples/benchmarks/ | | status | open | | owner | tbd | ## Current state `schema/examples/benchmarks/` directory absent or sparse. Coverage-checklist OpenQuestion "Benchmark-run primitives at SDK layer" is unresolved. Major 2025/2026 benchmarks not represented: - SWE-bench Verified (current de-facto coding agent benchmark) - Aider Polyglot - ARC-AGI 2 - Terminal-Bench - HumanEval/MBPP (older, but still cited) ## Desired state Add 5 `Benchmark` example files; add `EvalRun` examples for at least Claude Opus 4.7 and gpt-5-codex on SWE-bench Verified to demonstrate the eval graph. ## Evidence - swebench.com - arcprize.org - aider.chat/docs/leaderboards/ ## Propagation status - Level 1: open - Level 2: not-started ## Propagation chain - Level 1: 5 example files + 2 EvalRun example files. ## Notes P3 — important for usefulness but not for schema correctness.

documents

[]

Outgoing edges

None.

Incoming edges

contains_page1

page:process-gaps·PageGap Tracker