Agentic AI Atlas

Wiki article

process/gaps/GAP-L1-P3-benchmarks-stale

Reading · 1 min

GAP-L1-P3-benchmarks-stale reference

schema/examples/benchmarks/ directory absent or sparse. Coverage-checklist OpenQuestion "Benchmark-run primitives at SDK layer" is unresolved. Major 2025/2026 benchmarks not represented:

Page nodewiki/process/gaps/GAP-L1-P3-benchmarks-stale.mdNearby pages · 21Documents · 0

Continue reading

Nearby pages in the same section.

GAP-L1-P0-claude-code-plugin-component-types GAP-L1-P0-claude-models-pricing-and-lineup GAP-L1-P0-mcp-spec-2025-11-25 GAP-L1-P1-adaptive-thinking-vs-extended-thinking GAP-L1-P1-anthropic-skills-vs-claude-code-skills GAP-L1-P1-cursor-profiles-and-modes GAP-L1-P1-mcp-elicitation-and-resource-links GAP-L1-P1-mcp-oauth-resource-server GAP-L1-P1-repo-graph-discovery-signal GAP-L1-P1-repo-graph-session-lifecycle-semantics GAP-L1-P2-gemini-2-5-and-3 GAP-L1-P2-mcp-stdio-vs-http-sse-deprecation GAP-L1-P2-openai-codex-and-responses-api GAP-L1-P2-repo-graph-cisurface-packagesurface GAP-L1-P2-repo-graph-pluginartifact GAP-L2-P0-pathdescriptor-undeclared-but-referenced GAP-L2-P1-edge-kinds-md-vs-yaml-parity GAP-L2-P1-mcptransport-status-attribute-undeclared GAP-L2-P2-cluster-count-mismatch GAP-L2-P2-coverage-checklist-internal-broken-refs GAP-L2-P2-versionrange-attribute-on-modelversion

GAP-L1-P3-benchmarks-stale

Field	Value
id	gap:benchmarks-stale
title	Benchmark NodeKind has no current SWE-bench Verified, Aider Polyglot, ARC-AGI 2 examples
level	1
priority	P3
discoveredAt	2026-04-28T00:00:00Z
source	schema/examples/benchmarks/
status	open
owner	tbd

Current state

schema/examples/benchmarks/ directory absent or sparse. Coverage-checklist OpenQuestion "Benchmark-run primitives at SDK layer" is unresolved. Major 2025/2026 benchmarks not represented:

SWE-bench Verified (current de-facto coding agent benchmark)
Aider Polyglot
ARC-AGI 2
Terminal-Bench
HumanEval/MBPP (older, but still cited)

Desired state

Add 5 Benchmark example files; add EvalRun examples for at least Claude Opus 4.7 and gpt-5-codex on SWE-bench Verified to demonstrate the eval graph.

Evidence

swebench.com
arcprize.org
aider.chat/docs/leaderboards/

Propagation status

Level 1: open
Level 2: not-started

Propagation chain

Level 1: 5 example files + 2 EvalRun example files.

Notes

P3 — important for usefulness but not for schema correctness.

Wiki article

process/gaps/GAP-L1-P3-benchmarks-stale

Reading · 1 min

GAP-L1-P3-benchmarks-stale reference

schema/examples/benchmarks/ directory absent or sparse. Coverage-checklist OpenQuestion "Benchmark-run primitives at SDK layer" is unresolved. Major 2025/2026 benchmarks not represented:

Page nodewiki/process/gaps/GAP-L1-P3-benchmarks-stale.mdNearby pages · 21Documents · 0

Continue reading

Nearby pages in the same section.

GAP-L1-P3-benchmarks-stale

Field	Value
id	gap:benchmarks-stale
title	Benchmark NodeKind has no current SWE-bench Verified, Aider Polyglot, ARC-AGI 2 examples
level	1
priority	P3
discoveredAt	2026-04-28T00:00:00Z
source	schema/examples/benchmarks/
status	open
owner	tbd

Current state

schema/examples/benchmarks/ directory absent or sparse. Coverage-checklist OpenQuestion "Benchmark-run primitives at SDK layer" is unresolved. Major 2025/2026 benchmarks not represented:

SWE-bench Verified (current de-facto coding agent benchmark)
Aider Polyglot
ARC-AGI 2
Terminal-Bench
HumanEval/MBPP (older, but still cited)

Desired state

Add 5 Benchmark example files; add EvalRun examples for at least Claude Opus 4.7 and gpt-5-codex on SWE-bench Verified to demonstrate the eval graph.

Evidence

swebench.com
arcprize.org
aider.chat/docs/leaderboards/

Propagation status

Level 1: open
Level 2: not-started

Propagation chain

Level 1: 5 example files + 2 EvalRun example files.

Notes

P3 — important for usefulness but not for schema correctness.