Agentic AI Atlasby a5c.ai
OverviewWikiGraphFor AgentsEdgesSearchWorkspace
/
GitHubDocsDiscord
i.3Wiki
Agentic AI Atlas · Test Lanes
docs/testing/test-lanesa5c.ai
Search the atlas/
Wiki · linked records

Article and nearby pages

I.Current articlepp. 1 - 1
Agent Mux And Runtime E2ECoverage And ReportingCurrent Test Command InventoryHarness And Plugin E2EImplementation RoadmapMock And Fixture Contracts
I.
Wiki article

docs/testing/test-lanes

Reading · 6 min

Test Lanes reference

The replacement strategy has two top-level lanes. Every new test must declare which lane it belongs to before it is added to CI.

Page nodewiki/docs/testing/test-lanes.mdNearby pages · 11Documents · 0

Continue reading

Nearby pages in the same section.

Agent Mux And Runtime E2ECoverage And ReportingCurrent Test Command InventoryHarness And Plugin E2EImplementation RoadmapMock And Fixture ContractsPipeline IntegrationPrimary Flow Data PathsQuality GatesStack PermutationsTrace Identifiers And Evidence

Test Lanes

The replacement strategy has two top-level lanes. Every new test must declare which lane it belongs to before it is added to CI.

Lane 1: No-Model Tests

No-model tests must run without provider secrets, paid model calls, or installed external agent CLIs beyond normal package dependencies.

ScopePrimary toolsWhat it coversCI timing
Package unit testsVitestPure functions, schema parsing, protocol serialization, command helpers, state machinesEvery PR and push
Contract testsVitest + fixturesStable boundaries between SDK, hooks-mux, agent-mux, transport-mux, agent-core, and babysitter-agent, including transport-mux route matrix and runtime env injection contractsEvery PR and push
Mock harness testsVitest + existing mock adaptersSession lifecycle, adapter dispatch, tool-call translation, stop-hook semantics, plugin discovery, fallback metadataEvery PR and push
Browser/UI E2EPlaywright + mock gatewayAgent-mux WebUI session flows, transcript rendering, model picker behavior, approvals, reconnect behaviorPRs touching WebUI/gateway/session code; staging before publish
CLI smoke testsNode subprocess testsbabysitter, amux, hooks-mux CLI, package entrypoints, help output, dry-run pathsEvery PR for touched packages; staging before publish
Docs and generated assetsExisting docs QA and generator checksDocumentation links, snippets, generated plugin bundles, command templatesEvery PR and push

No-model tests should prefer deterministic fixture transcripts and mock harness implementations. They should never skip because an API key is missing; if a test cannot run without a provider key, it belongs in the model-backed lane.

Implemented No-Model Matrices

Publish owns the no_model_mock_matrix job as a stack E2E matrix, not as a package-suite aggregator. The matrix dimensions are declared in .github/workflows/publish.yml and the test consumes exactly one selected lane:

DimensionValuesRequired proof
Agent runtimeagent-mux-mocks, real-agentThe lane installs/verifies the target through amux install --dry-run, then launches the agent path selected by the runtime dimension
Agentclaude, codex, pi, geminiThe target CLI path is selected by agent-mux and produces per-agent evidence
Hook modenone, hooks-muxhooks-mux lanes register an amux hooks command bridge and assert the normalized hooks-mux phase evidence

Every no-model stack lane starts a local transport-mux runtime with a fixture completion engine. The launched agent, including the local CI shim for real-agent lanes and the mock-harness path for agent-mux-mocks, must send a request through that transport-mux runtime and attach the request count plus redacted evidence under publish-no-model-stack-*.

Publish also has an agent_mux_hooks_mux_e2e no-model/no-SDK job. It is intentionally separate from the live Babysitter plugin matrix: the GitHub matrix chooses claude-code, codex, and pi; the test only consumes that one selected lane, registers an amux hooks command hook, bridges the native payload into a5c-hooks-mux invoke, and asserts the hooks-mux normalized phase evidence.

Lane 2: Model-Backed Tests

Model-backed tests exercise real provider integrations, real installed harnesses, and real credentials.

ScopeRequired setupWhat it coversCI timing
SDK harness/plugin setup smokebabysitter harness:install <name> and babysitter harness:install-plugin <name>Installer delegation, plugin target resolution, idempotent manifests; not babysitter-agent runtimeScheduled, manual, staging gate
Agent-mux plugin/session E2EProvider secrets, installed external CLI, and plugin precondition where supportedamux run or createClient().run starts a session, plugin command creates a Babysitter run, and hooks/process lifecycle are assertedScheduled, manual, staging gate
Babysitter-agent live orchestrationPreinstalled/mocked backend plus OPENAI_API_KEY, configured Foundry/OpenAI credentials, or configured cloud equivalents where needed@a5c-ai/babysitter-agent can plan, execute, post task results, and close a run without executing harness installer commandspublish.yml staging/main preflight, manual
Agent-mux live adaptersProvider-specific credentialsClaude Code and Codex adapters produce protocol events that match the mux contractsScheduled/manual first, then publish.yml release preflight after quarantine
Transport-mux live transportLocal process ports plus provider/harness credentialsTransport-mux carries real agent-core streams and agent-mux-launched external harness traffic through proxy routes with redacted launch/env/metrics artifactsScheduled/manual first, then publish.yml after quarantine

Model-backed tests must be opt-in by environment detection. A missing credential should mark the lane skipped or not scheduled, not silently pass a test that claims provider coverage.

Transport-Mux Lane Split

Transport-mux scenarios must be split across both lanes instead of hidden inside broad mux smoke tests.

ScenarioLaneWhy it belongs there
Route/codec matrix for supported transportsNo-modelA fixture completion engine can prove request parsing, response envelopes, streaming shape, auth errors, invalid JSON, and token-count behavior deterministically
Runtime lifecycle and env injectionNo-modelLocal ports and redacted env diffs do not require provider credentials
Agent-mux launch proxy decisionNo-modelresolveLaunchPlan can prove proxy forced/if-needed/native/forbidden behavior with fixture provider configs
Agent-core through transport-muxBothFixture stream belongs in no-model; live provider stream belongs in model-backed when credentials exist
External harness through agent-mux proxyModel-backedOnly a real harness plus provider credential can prove the harness actually consumes the proxy env and completes a sentinel stream
Passthrough upstream bridgeNo-model first, model-backed optionalPath/query/auth/error mapping is deterministic with a fixture upstream; live passthrough only adds value for provider-specific drift

Required Labels

Every test file or workflow job should map to one of these labels:

  • lane:no-model
  • lane:model-backed
  • scope:unit
  • scope:contract
  • scope:integration
  • scope:e2e
  • scope:release-gate

These labels can start as workflow/job names and test descriptions. They only need to become machine-readable once the first implementation slice adds the new runners.

Lane Ownership

LanePrimary ownerRequired reviewerFailure triage clock
No-model package and contract testsOwning package maintainerAdjacent package maintainer when a boundary contract changesSame business day for PR failures
No-model UI and CLI smokeSurface ownerRuntime maintainer when session behavior changesSame business day for PR failures
Model-backed harness smokeHarness maintainerCI maintainer for secret and runner changesNext business day for scheduled failures; immediate for staging/release failures
Model-backed runtime smokeRuntime maintainerHarness and mux maintainersImmediate for staging/release failures
Coverage/reportingCI maintainerPackage owner when thresholds changeSame business day for blocking report failures

Admission Criteria

A test may enter the no-model lane when it has deterministic fixtures, no provider credentials, bounded runtime, and a package owner.

A test may enter the model-backed lane when it has explicit credential gates, redacted artifacts, a live behavior that mocks cannot prove, a retry policy, and an owner for provider-specific failures.

Promotion Path

1. Local/package command. 2. PR/push no-model lane. 3. Scheduled model-backed lane, if provider behavior matters. 4. Staging preflight only if it protects publish correctness. 5. Release preflight only if missing the test can publish a broken production artifact.

Trail

Wiki
Babysitter Docs
Testing Strategy

Test Lanes

Continue reading

Agent Mux And Runtime E2E
Coverage And Reporting
Current Test Command Inventory
Harness And Plugin E2E
Implementation Roadmap
Mock And Fixture Contracts
Pipeline Integration
Primary Flow Data Paths

Page record

Open node ledger

wiki/docs/testing/test-lanes.md

Documents

No documented graph nodes on this page.