Agentic AI Atlasby a5c.ai
OverviewWikiGraphFor AgentsEdgesSearchWorkspace
/
GitHubDocsDiscord
iiRecord
Agentic AI Atlas · Testing Strategy
page:docs-testinga5c.ai
Search record views/
Record · tabs

Available views

II.Record viewspp. 1 - 1
overviewarticlejsongraph
III.Related pagespp. 1 - 1
II.
Page reference

page:docs-testing

Reading · 4 min

Testing Strategy reference

This directory defines the replacement testing strategy after the legacy Docker and Docker-E2E workflows were removed. The current CI implementation lives primarily in .github/workflows/publish.yml, with GitHub Actions owning the live-stack scenario and OS matrix. The new plan starts from repository-native package boundaries, Babysitter harness setup commands, the babysitter-agent runtime surface, and explicit model/no-model lanes instead of reusing the retired Docker image and e2e-tests/docker suite.

Pagewiki/docs/testing/index.mdOutgoing · 0Incoming · 0

Testing Strategy

This directory defines the replacement testing strategy after the legacy Docker and Docker-E2E workflows were removed. The current CI implementation lives primarily in .github/workflows/publish.yml, with GitHub Actions owning the live-stack scenario and OS matrix. The new plan starts from repository-native package boundaries, Babysitter harness setup commands, the babysitter-agent runtime surface, and explicit model/no-model lanes instead of reusing the retired Docker image and e2e-tests/docker suite.

Documents

  • Test Lanes defines the two top-level lanes: no-model deterministic tests and model-backed tests that require real provider credentials.
  • Harness And Plugin E2E separates SDK harness/plugin setup from agent-mux plugin/session E2E.
  • Agent Mux And Runtime E2E defines runtime coverage for agent-mux, transport-mux, agent-core, and @a5c-ai/babysitter-agent flows after setup preconditions are satisfied.
  • Pipeline Integration defines where each lane belongs in CI, staging, release, scheduled, and manual workflows.
  • Coverage And Reporting defines repo-wide coverage reporting, artifacts, logs, and pass/fail evidence.
  • Implementation Roadmap defines rollout slices, exit criteria, and stop conditions.
  • Current Test Command Inventory maps existing package test-like commands to lane, scope, owner, artifact name, and pipeline placement for roadmap slice 0.
  • Mock And Fixture Contracts defines deterministic fixture families and live/mock compatibility rules.
  • Quality Gates defines release-evidence gates and adversarial review criteria.
  • Stack Permutations defines valid and invalid layer combinations across the modular stack.
  • Primary Flow Data Paths maps the full data path for the main agent-mux, babysitter-agent, SDK run, hooks-mux, and transport-mux flows.
  • Trace Identifiers And Evidence defines the IDs, logs, files, and artifact bundles required to correlate those flows.

Principles

  • Separate tests that need model credentials from tests that can run with mocks, fixtures, or local fakes.
  • Make setup explicit and repeatable, but do not conflate setup with runtime: SDK harness/plugin setup, agent-mux plugin/session E2E, and babysitter-agent runtime E2E are separate paths.
  • Test mux boundaries at multiple scopes: protocol contracts, adapter translation, transport behavior, gateway/session behavior, UI behavior, and full runtime orchestration.
  • Prefer package-local tests for fast feedback, then compose them into broader lanes only when the integration surface matters.
  • Treat live model runs as release evidence, not as the first line of feedback for every pull request.
  • Promote tests through explicit gates: manual, scheduled, staging preflight, then release preflight.
  • Require each model-backed claim to have a no-model fixture or contract counterpart unless the behavior is inherently provider-only.

Status Legend

StatusMeaning
CurrentCommand, workflow, or package test exists today and can be validated now.
ProposedContract name or workflow shape this strategy recommends for a future implementation slice; not the current source of truth unless a current workflow or package script is named.
Promotion targetA test exists or is planned in a lower lane and should move only after meeting quality gates.

Unless a document explicitly says Current, command bundles and workflow names are proposed implementation targets.

Current State

The repository already has Vitest, Playwright, package-local test scripts, release verification scripts, docs QA, metadata checks, architecture gates, and staging/release workflows. This strategy names how to organize the next E2E generation around those surfaces rather than around the removed Docker workflows.

Requested Scope Traceability

Requested scopePrimary docsLaneFirst implementation surface
Codex E2EHarness And Plugin E2E, Stack PermutationsNo-model setup/session first, then capability-gated model-backedHarness setup smoke, Codex adapter protocol fixture, plugin E2E only after capability proof; babysitter-agent runtime is separate
Claude Code E2EHarness And Plugin E2E, Stack PermutationsNo-model setup/session first, then model-backedHarness setup smoke, agent-mux session, plugin-manager where supported, /babysitter:call plugin smoke, Claude hook/tool-call fixture
harness:install and plugin setupHarness And Plugin E2E, Stack PermutationsSetup onlyDry-run install JSON, plugin discovery JSON, idempotency checks; no babysitter-agent runtime claim
Agent-mux functionality requiring credentialsAgent Mux And Runtime E2E, Pipeline IntegrationModel-backedLive adapter matrix for Codex and Claude Code
Babysitter-agent whole-system flowAgent Mux And Runtime E2E, Stack PermutationsBothMock planner/executor first, bounded live process after staging promotion, no installer commands inside runtime E2E
Muxes and transport-muxAgent Mux And Runtime E2E, Mock And Fixture Contracts, Primary Flow Data PathsBothShared event fixtures, transport roundtrip, live transport smoke with trace identifiers
Hooks muxesAgent Mux And Runtime E2E, Mock And Fixture Contracts, Trace Identifiers And EvidenceBothNormalized hook fixtures, live hook replay after redaction with session/run correlation
Pipeline integrationPipeline Integration, Implementation RoadmapBothNew workflow contracts and staged required checks
Coverage reportingCoverage And ReportingBothPackage coverage baselines plus scenario coverage summaries

Article source

The article body is owned directly by this record.

Related pages

No related wiki pages for this record.

Shortcuts

Open overview
Open JSON
Open graph