Agentic AI Atlasby a5c.ai
OverviewWikiGraphFor AgentsEdgesSearchWorkspace
/
GitHubDocsDiscord
i.2Wiki
Agentic AI Atlas · Testing Strategy
docs/testinga5c.ai
Search the atlas/
Wiki · section map

Trail and section pages

I.In this sectionpp. 1 - 1
Agent Mux And Runtime E2ECoverage And ReportingCurrent Test Command InventoryHarness And Plugin E2EImplementation RoadmapMock And Fixture Contracts
I.
Wiki article

docs/testing

Reading · 4 min

Testing Strategy guide

This directory defines the replacement testing strategy after the legacy Docker and Docker-E2E workflows were removed. The current CI implementation lives primarily in .github/workflows/publish.yml, with GitHub Actions owning the live-stack scenario and OS matrix. The new plan starts from repository-native package boundaries, Babysitter harness setup commands, the babysitter-agent runtime surface, and explicit model/no-model lanes instead of reusing the retired Docker image and e2e-tests/docker suite.

Page nodewiki/docs/testing/index.mdSection pages · 12Documents · 0

Pages in this section

Start with the section hub, then move sideways into adjacent pages when you need more detail.

Agent Mux And Runtime E2E

Page

This strategy covers runtime paths after setup is already satisfied. It separates agent-mux sessions, transport carriers, agent-core programmatic sessions, and @a5c-ai/babysitter-agent orchestration. Harness/plugin install coverage lives in Harness And Plugin E2E(./harness-e2e.md), not in babysitter-agent runtime E2E.

wiki/docs/testing/agent-mux-and-runtime-e2e.md

Coverage And Reporting

Page

Coverage reporting should make the repository-wide test story visible without turning every test into one slow monolithic gate.

wiki/docs/testing/coverage-and-reporting.md

Current Test Command Inventory

Page

Status: Current. This inventory implements roadmap slice 0, "Inventory and naming". It maps existing CI-relevant test-like package scripts to package or surface, lane, scope, owner, artifact name, and pipeline placement. Proposed future bundles remain in Pipeline Integration(./pipeline-integration.mdproposed-command-bundles) and are not treated as current commands here.

wiki/docs/testing/current-test-command-inventory.md

Harness And Plugin E2E

Page

This document covers harness setup and plugin-enabled sessions. It intentionally separates two different integration types:

wiki/docs/testing/harness-e2e.md

Implementation Roadmap

Page

This roadmap turns the strategy into implementation slices. Each slice must land with docs, package scripts or workflow wiring, and proof artifacts before the next slice depends on it. Status reflects the current unified Publish workflow, where live-stack scenario selection is owned by GitHub Actions rather than test code.

wiki/docs/testing/implementation-roadmap.md

Mock And Fixture Contracts

Page

No-model tests are only valuable if their mocks describe the same contracts live providers must satisfy. This document defines fixture expectations for Codex, Claude Code, agent-core, agent-mux, transport-mux, hooks muxes, and babysitter-agent.

wiki/docs/testing/mock-and-fixture-contracts.md

Pipeline Integration

Page

The pipeline should add new testing lanes in stages. No-model tests protect every pull request. Model-backed tests protect promotion and release confidence without making ordinary PRs depend on provider availability.

wiki/docs/testing/pipeline-integration.md

Primary Flow Data Paths

Page

This document maps the main flows that the rebuilt E2E strategy should prove. It is intentionally data-path oriented: every flow names the caller, command/API boundary, state that must be created, hook/session artifacts that should exist, and the identifiers that let a test join evidence across packages.

wiki/docs/testing/primary-flow-data-paths.md

Quality Gates

Page

These gates define what must be true before a new test lane, workflow, or model-backed scenario is treated as release evidence.

wiki/docs/testing/quality-gates.md

Stack Permutations

Page

The test strategy must treat the stack as modular. A valid E2E does not need every layer, and some layer combinations are invalid even if the names sound related.

wiki/docs/testing/stack-permutations.md

Test Lanes

Page

The replacement strategy has two top-level lanes. Every new test must declare which lane it belongs to before it is added to CI.

wiki/docs/testing/test-lanes.md

Trace Identifiers And Evidence

Page

Use this document as the evidence checklist for tests described in Primary Flow Data Paths(./primary-flow-data-paths.md). A scenario should not be marked E2E unless it records the identifiers needed to join the agent session, hook events, Babysitter run state, and transport trace.

wiki/docs/testing/trace-identifiers-and-evidence.md

Testing Strategy

This directory defines the replacement testing strategy after the legacy Docker and Docker-E2E workflows were removed. The current CI implementation lives primarily in .github/workflows/publish.yml, with GitHub Actions owning the live-stack scenario and OS matrix. The new plan starts from repository-native package boundaries, Babysitter harness setup commands, the babysitter-agent runtime surface, and explicit model/no-model lanes instead of reusing the retired Docker image and e2e-tests/docker suite.

Documents

  • Test Lanes defines the two top-level lanes: no-model deterministic tests and model-backed tests that require real provider credentials.
  • Harness And Plugin E2E separates SDK harness/plugin setup from agent-mux plugin/session E2E.
  • Agent Mux And Runtime E2E defines runtime coverage for agent-mux, transport-mux, agent-core, and @a5c-ai/babysitter-agent flows after setup preconditions are satisfied.
  • Pipeline Integration defines where each lane belongs in CI, staging, release, scheduled, and manual workflows.
  • Coverage And Reporting defines repo-wide coverage reporting, artifacts, logs, and pass/fail evidence.
  • Implementation Roadmap defines rollout slices, exit criteria, and stop conditions.
  • Current Test Command Inventory maps existing package test-like commands to lane, scope, owner, artifact name, and pipeline placement for roadmap slice 0.
  • Mock And Fixture Contracts defines deterministic fixture families and live/mock compatibility rules.
  • Quality Gates defines release-evidence gates and adversarial review criteria.
  • Stack Permutations defines valid and invalid layer combinations across the modular stack.
  • Primary Flow Data Paths maps the full data path for the main agent-mux, babysitter-agent, SDK run, hooks-mux, and transport-mux flows.
  • Trace Identifiers And Evidence defines the IDs, logs, files, and artifact bundles required to correlate those flows.

Principles

  • Separate tests that need model credentials from tests that can run with mocks, fixtures, or local fakes.
  • Make setup explicit and repeatable, but do not conflate setup with runtime: SDK harness/plugin setup, agent-mux plugin/session E2E, and babysitter-agent runtime E2E are separate paths.
  • Test mux boundaries at multiple scopes: protocol contracts, adapter translation, transport behavior, gateway/session behavior, UI behavior, and full runtime orchestration.
  • Prefer package-local tests for fast feedback, then compose them into broader lanes only when the integration surface matters.
  • Treat live model runs as release evidence, not as the first line of feedback for every pull request.
  • Promote tests through explicit gates: manual, scheduled, staging preflight, then release preflight.
  • Require each model-backed claim to have a no-model fixture or contract counterpart unless the behavior is inherently provider-only.

Status Legend

StatusMeaning
CurrentCommand, workflow, or package test exists today and can be validated now.
ProposedContract name or workflow shape this strategy recommends for a future implementation slice; not the current source of truth unless a current workflow or package script is named.
Promotion targetA test exists or is planned in a lower lane and should move only after meeting quality gates.

Unless a document explicitly says Current, command bundles and workflow names are proposed implementation targets.

Current State

The repository already has Vitest, Playwright, package-local test scripts, release verification scripts, docs QA, metadata checks, architecture gates, and staging/release workflows. This strategy names how to organize the next E2E generation around those surfaces rather than around the removed Docker workflows.

Requested Scope Traceability

Requested scopePrimary docsLaneFirst implementation surface
Codex E2EHarness And Plugin E2E, Stack PermutationsNo-model setup/session first, then capability-gated model-backedHarness setup smoke, Codex adapter protocol fixture, plugin E2E only after capability proof; babysitter-agent runtime is separate
Claude Code E2EHarness And Plugin E2E, Stack PermutationsNo-model setup/session first, then model-backedHarness setup smoke, agent-mux session, plugin-manager where supported, /babysitter:call plugin smoke, Claude hook/tool-call fixture
harness:install and plugin setupHarness And Plugin E2E, Stack PermutationsSetup onlyDry-run install JSON, plugin discovery JSON, idempotency checks; no babysitter-agent runtime claim
Agent-mux functionality requiring credentialsAgent Mux And Runtime E2E, Pipeline IntegrationModel-backedLive adapter matrix for Codex and Claude Code
Babysitter-agent whole-system flowAgent Mux And Runtime E2E, Stack PermutationsBothMock planner/executor first, bounded live process after staging promotion, no installer commands inside runtime E2E
Muxes and transport-muxAgent Mux And Runtime E2E, Mock And Fixture Contracts, Primary Flow Data PathsBothShared event fixtures, transport roundtrip, live transport smoke with trace identifiers
Hooks muxesAgent Mux And Runtime E2E, Mock And Fixture Contracts, Trace Identifiers And EvidenceBothNormalized hook fixtures, live hook replay after redaction with session/run correlation
Pipeline integrationPipeline Integration, Implementation RoadmapBothNew workflow contracts and staged required checks
Coverage reportingCoverage And ReportingBothPackage coverage baselines plus scenario coverage summaries

Trail

Wiki
Babysitter Docs

Testing Strategy

In this section

Agent Mux And Runtime E2E
Coverage And Reporting
Current Test Command Inventory
Harness And Plugin E2E
Implementation Roadmap
Mock And Fixture Contracts
Pipeline Integration
Primary Flow Data Paths

Page record

Open node ledger

wiki/docs/testing/index.md

Documents

No documented graph nodes on this page.