Agentic AI Atlasby a5c.ai
OverviewWikiGraphFor AgentsEdgesSearchWorkspace
/
GitHubDocsDiscord
i.3Wiki
Agentic AI Atlas · Implementation Roadmap
docs/testing/implementation-roadmapa5c.ai
Search the atlas/
Wiki · linked records

Article and nearby pages

I.Current articlepp. 1 - 1
Agent Mux And Runtime E2ECoverage And ReportingCurrent Test Command InventoryHarness And Plugin E2EMock And Fixture ContractsPipeline Integration
I.
Wiki article

docs/testing/implementation-roadmap

Reading · 4 min

Implementation Roadmap reference

This roadmap turns the strategy into implementation slices. Each slice must land with docs, package scripts or workflow wiring, and proof artifacts before the next slice depends on it. Status reflects the current unified Publish workflow, where live-stack scenario selection is owned by GitHub Actions rather than test code.

Page nodewiki/docs/testing/implementation-roadmap.mdNearby pages · 11Documents · 0

Continue reading

Nearby pages in the same section.

Agent Mux And Runtime E2ECoverage And ReportingCurrent Test Command InventoryHarness And Plugin E2EMock And Fixture ContractsPipeline IntegrationPrimary Flow Data PathsQuality GatesStack PermutationsTest LanesTrace Identifiers And Evidence

Implementation Roadmap

This roadmap turns the strategy into implementation slices. Each slice must land with docs, package scripts or workflow wiring, and proof artifacts before the next slice depends on it. Status reflects the current unified Publish workflow, where live-stack scenario selection is owned by GitHub Actions rather than test code.

Rollout Slices

SliceStatusPrimary ownerLaneExit criteriaPipeline target
0. Inventory and namingPartly current, needs reference cleanup as workflows evolveCI maintainersNo-modelCurrent command inventory maps every current test-like command to package, lane, scope, owner, artifact name, and pipeline placementDocumentation and publish.yml comments/artifact names
1. Mock and fixture contractsNext no-model implementation sliceRuntime maintainersNo-modelMock Codex, Claude Code, agent-core, transport-mux, and gateway transcripts are shared by unit, integration, and UI testsPR/push CI
2. SDK harness/plugin setup smokePlannedSDK and harness maintainersNo-modelharness:install --dry-run, harness:install-plugin --dry-run, and plugin discovery produce JSON evidence without claiming babysitter-agent runtime coveragePR/push CI
3. Mux integration coveragePartly covered by current mux validation; transport-mux fixtures remain the largest gapAgent-mux maintainersNo-modelTransport-mux route/runtime/env/launch-plan coverage, agent-mux gateway, adapters, and WebUI run against compatible fixture sessionsPR/push CI and publish.yml validation
4. Minimal live harness smokeImplemented as a publish.yml scenario/OS/install-mode matrix smoke, still needs quarantine history and richer assertionsHarness maintainersModel-backedThe currently valid Claude Code, Pi, and babysitter-agent-through-agent-mux live-stack scenarios run from GitHub Actions matrix entries in babysitter-plugin and vanilla modes, with Codex/Gemini and Windows lanes added only after their live install/launch prerequisites are green and upload redacted artifactspublish.yml live_stack_e2e
5. Split live E2E smokesPartly implemented; deeper path-specific jobs remainRuntime and mux maintainersModel-backedAgent-mux plugin/session smoke uses pipeline-selected plugin preconditions; transport-mux bridge smoke proves agent-core and external-harness proxy paths; babysitter-agent runtime smoke uses runtime path with no installer stepspublish.yml staging/release preflight
6. Coverage aggregationPartly implemented for live-stack scenario JSON onlyCI maintainersBothPackage coverage, Playwright traces, and live-run summaries merge into one workflow summaryPR/push for no-model, publish.yml for live

Live Install-Mode Axis

publish.yml owns the live-stack install_mode matrix dimension:

Install modeSetup pathLaunch pathRequired proof
babysitter-pluginGenerate plugin packages, install the target agent with amux install, install the local Babysitter SDK, then install the Babysitter plugin for the target harnessamux launch <agent> <provider> with a /babysitter:call ... promptAgent-mux IDs, Babysitter run/effect IDs, native hook evidence, hooks-mux event evidence, transport trace, redacted provider trace
vanillaInstall the target agent with amux install onlyamux launch <agent> <provider> with a normal non-Babysitter promptAgent-mux IDs, session ID, transport trace, redacted provider trace

Both modes use agent-mux for external agent installation and launch. The test code reads the selected mode from workflow env and must not enumerate scenario matrices internally.

Definition Of Done Per Slice

A slice is complete only when it includes:

  • package-local or workflow-level command entrypoint,
  • lane label (lane:no-model or lane:model-backed),
  • owner and trigger in pipeline docs,
  • artifact contract,
  • skip/failure semantics,
  • docs update in docs/testing/,
  • validation proof from local command or CI run.

Initial PR Sequence

1. Keep publish.yml as the owner of live-stack scenario, OS, and install-mode matrices; do not reintroduce code-side matrix runners. 2. Add no-model transport-mux local route/codec, env-injection, passthrough, metrics/cache, and launch-plan proxy-decision scenarios. 3. Add shared fixture transcript format and migrate one agent-mux or transport-mux test to consume it. 4. Add no-model harness dry-run tests for Codex and Claude Code. 5. Add transport-mux fixture and agent-core event stream replay scenarios. 6. Add credential-gated transport-mux bridge smoke for agent-core and one external harness through amux launch --with-proxy*. 7. Expand live-stack artifacts into a merged scenario checklist in the workflow summary. 8. Extract reusable testing workflows only if publish.yml grows too large or another workflow needs the same lane contract.

Stop Conditions

Pause rollout if any of these happen:

  • live tests flake for provider reasons more than twice in a week,
  • artifacts contain unredacted credential material,
  • no-model CI runtime grows beyond the agreed PR budget,
  • publish is blocked by a non-release-critical live test,
  • mocks diverge from live event fixtures without an explicit compatibility issue.

Trail

Wiki
Babysitter Docs
Testing Strategy

Implementation Roadmap

Continue reading

Agent Mux And Runtime E2E
Coverage And Reporting
Current Test Command Inventory
Harness And Plugin E2E
Mock And Fixture Contracts
Pipeline Integration
Primary Flow Data Paths
Quality Gates

Page record

Open node ledger

wiki/docs/testing/implementation-roadmap.md

Documents

No documented graph nodes on this page.