II.
Page JSON
Structured · livepage:process-gaps-GAP-L1-P3-benchmarks-stale
GAP-L1-P3-benchmarks-stale json
Inspect the normalized record payload exactly as the atlas UI reads it.
{
"id": "page:process-gaps-GAP-L1-P3-benchmarks-stale",
"_kind": "Page",
"_file": "wiki/process/gaps/GAP-L1-P3-benchmarks-stale.md",
"_cluster": "wiki",
"attributes": {
"nodeKind": "Page",
"title": "GAP-L1-P3-benchmarks-stale",
"displayName": "GAP-L1-P3-benchmarks-stale",
"slug": "process/gaps/GAP-L1-P3-benchmarks-stale",
"articlePath": "wiki/process/gaps/GAP-L1-P3-benchmarks-stale.md",
"article": "# GAP-L1-P3-benchmarks-stale\n\n| Field | Value |\n|---|---|\n| id | gap:benchmarks-stale |\n| title | Benchmark NodeKind has no current SWE-bench Verified, Aider Polyglot, ARC-AGI 2 examples |\n| level | 1 |\n| priority | P3 |\n| discoveredAt | 2026-04-28T00:00:00Z |\n| source | schema/examples/benchmarks/ |\n| status | open |\n| owner | tbd |\n\n## Current state\n`schema/examples/benchmarks/` directory absent or sparse. Coverage-checklist OpenQuestion \"Benchmark-run primitives at SDK layer\" is unresolved. Major 2025/2026 benchmarks not represented:\n- SWE-bench Verified (current de-facto coding agent benchmark)\n- Aider Polyglot\n- ARC-AGI 2\n- Terminal-Bench\n- HumanEval/MBPP (older, but still cited)\n\n## Desired state\nAdd 5 `Benchmark` example files; add `EvalRun` examples for at least Claude Opus 4.7 and gpt-5-codex on SWE-bench Verified to demonstrate the eval graph.\n\n## Evidence\n- swebench.com\n- arcprize.org\n- aider.chat/docs/leaderboards/\n\n## Propagation status\n- Level 1: open\n- Level 2: not-started\n\n## Propagation chain\n- Level 1: 5 example files + 2 EvalRun example files.\n\n## Notes\nP3 — important for usefulness but not for schema correctness.\n",
"documents": []
},
"outgoingEdges": [],
"incomingEdges": []
}