Agentic AI Atlas

II.

Page JSON

page:process-gaps-GAP-L1-P3-benchmarks-stale

Structured · live

GAP-L1-P3-benchmarks-stale json

Inspect the normalized record payload exactly as the atlas UI reads it.

File · wiki/process/gaps/GAP-L1-P3-benchmarks-stale.mdCluster · wiki

Record JSON

{
  "id": "page:process-gaps-GAP-L1-P3-benchmarks-stale",
  "_kind": "Page",
  "_file": "wiki/process/gaps/GAP-L1-P3-benchmarks-stale.md",
  "_cluster": "wiki",
  "attributes": {
    "nodeKind": "Page",
    "title": "GAP-L1-P3-benchmarks-stale",
    "displayName": "GAP-L1-P3-benchmarks-stale",
    "slug": "process/gaps/GAP-L1-P3-benchmarks-stale",
    "articlePath": "wiki/process/gaps/GAP-L1-P3-benchmarks-stale.md",
    "article": "# GAP-L1-P3-benchmarks-stale\n\n| Field | Value |\n|---|---|\n| id | gap:benchmarks-stale |\n| title | Benchmark NodeKind has no current SWE-bench Verified, Aider Polyglot, ARC-AGI 2 examples |\n| level | 1 |\n| priority | P3 |\n| discoveredAt | 2026-04-28T00:00:00Z |\n| source | schema/examples/benchmarks/ |\n| status | open |\n| owner | tbd |\n\n## Current state\n`schema/examples/benchmarks/` directory absent or sparse. Coverage-checklist OpenQuestion \"Benchmark-run primitives at SDK layer\" is unresolved. Major 2025/2026 benchmarks not represented:\n- SWE-bench Verified (current de-facto coding agent benchmark)\n- Aider Polyglot\n- ARC-AGI 2\n- Terminal-Bench\n- HumanEval/MBPP (older, but still cited)\n\n## Desired state\nAdd 5 `Benchmark` example files; add `EvalRun` examples for at least Claude Opus 4.7 and gpt-5-codex on SWE-bench Verified to demonstrate the eval graph.\n\n## Evidence\n- swebench.com\n- arcprize.org\n- aider.chat/docs/leaderboards/\n\n## Propagation status\n- Level 1: open\n- Level 2: not-started\n\n## Propagation chain\n- Level 1: 5 example files + 2 EvalRun example files.\n\n## Notes\nP3 — important for usefulness but not for schema correctness.\n",
    "documents": []
  },
  "outgoingEdges": [],
  "incomingEdges": []
}