Agentic AI Atlas

II.

Page JSON

page:docs-testing-test-lanes

Structured · live

Test Lanes json

Inspect the normalized record payload exactly as the atlas UI reads it.

File · wiki/docs/testing/test-lanes.mdCluster · wiki

Record JSON

{
  "id": "page:docs-testing-test-lanes",
  "_kind": "Page",
  "_file": "wiki/docs/testing/test-lanes.md",
  "_cluster": "wiki",
  "attributes": {
    "nodeKind": "Page",
    "sourcePath": "docs/testing/test-lanes.md",
    "sourceKind": "repo-docs",
    "title": "Test Lanes",
    "displayName": "Test Lanes",
    "slug": "docs/testing/test-lanes",
    "articlePath": "wiki/docs/testing/test-lanes.md",
    "article": "\n# Test Lanes\n\nThe replacement strategy has two top-level lanes. Every new test must declare which lane it belongs to before it is added to CI.\n\n## Lane 1: No-Model Tests\n\nNo-model tests must run without provider secrets, paid model calls, or installed external agent CLIs beyond normal package dependencies.\n\n| Scope | Primary tools | What it covers | CI timing |\n| --- | --- | --- | --- |\n| Package unit tests | Vitest | Pure functions, schema parsing, protocol serialization, command helpers, state machines | Every PR and push |\n| Contract tests | Vitest + fixtures | Stable boundaries between SDK, hooks-mux, agent-mux, transport-mux, agent-core, and babysitter-agent, including transport-mux route matrix and runtime env injection contracts | Every PR and push |\n| Mock harness tests | Vitest + existing mock adapters | Session lifecycle, adapter dispatch, tool-call translation, stop-hook semantics, plugin discovery, fallback metadata | Every PR and push |\n| Browser/UI E2E | Playwright + mock gateway | Agent-mux WebUI session flows, transcript rendering, model picker behavior, approvals, reconnect behavior | PRs touching WebUI/gateway/session code; staging before publish |\n| CLI smoke tests | Node subprocess tests | `babysitter`, `amux`, hooks-mux CLI, package entrypoints, help output, dry-run paths | Every PR for touched packages; staging before publish |\n| Docs and generated assets | Existing docs QA and generator checks | Documentation links, snippets, generated plugin bundles, command templates | Every PR and push |\n\nNo-model tests should prefer deterministic fixture transcripts and mock harness implementations. They should never skip because an API key is missing; if a test cannot run without a provider key, it belongs in the model-backed lane.\n\n### Implemented No-Model Matrices\n\n`Publish` owns the `no_model_mock_matrix` job as a stack E2E matrix, not as a package-suite aggregator. The matrix dimensions are declared in `.github/workflows/publish.yml` and the test consumes exactly one selected lane:\n\n| Dimension | Values | Required proof |\n| --- | --- | --- |\n| Agent runtime | `agent-mux-mocks`, `real-agent` | The lane installs/verifies the target through `amux install --dry-run`, then launches the agent path selected by the runtime dimension |\n| Agent | `claude`, `codex`, `pi`, `gemini` | The target CLI path is selected by agent-mux and produces per-agent evidence |\n| Hook mode | `none`, `hooks-mux` | `hooks-mux` lanes register an `amux hooks` command bridge and assert the normalized hooks-mux phase evidence |\n\nEvery no-model stack lane starts a local transport-mux runtime with a fixture completion engine. The launched agent, including the local CI shim for real-agent lanes and the mock-harness path for `agent-mux-mocks`, must send a request through that transport-mux runtime and attach the request count plus redacted evidence under `publish-no-model-stack-*`.\n\n`Publish` also has an `agent_mux_hooks_mux_e2e` no-model/no-SDK job. It is intentionally separate from the live Babysitter plugin matrix: the GitHub matrix chooses `claude-code`, `codex`, and `pi`; the test only consumes that one selected lane, registers an `amux hooks` command hook, bridges the native payload into `a5c-hooks-mux invoke`, and asserts the hooks-mux normalized phase evidence.\n\n## Lane 2: Model-Backed Tests\n\nModel-backed tests exercise real provider integrations, real installed harnesses, and real credentials.\n\n| Scope | Required setup | What it covers | CI timing |\n| --- | --- | --- | --- |\n| SDK harness/plugin setup smoke | `babysitter harness:install <name>` and `babysitter harness:install-plugin <name>` | Installer delegation, plugin target resolution, idempotent manifests; not babysitter-agent runtime | Scheduled, manual, staging gate |\n| Agent-mux plugin/session E2E | Provider secrets, installed external CLI, and plugin precondition where supported | `amux run` or `createClient().run` starts a session, plugin command creates a Babysitter run, and hooks/process lifecycle are asserted | Scheduled, manual, staging gate |\n| Babysitter-agent live orchestration | Preinstalled/mocked backend plus `OPENAI_API_KEY`, configured Foundry/OpenAI credentials, or configured cloud equivalents where needed | `@a5c-ai/babysitter-agent` can plan, execute, post task results, and close a run without executing harness installer commands | `publish.yml` staging/main preflight, manual |\n| Agent-mux live adapters | Provider-specific credentials | Claude Code and Codex adapters produce protocol events that match the mux contracts | Scheduled/manual first, then `publish.yml` release preflight after quarantine |\n| Transport-mux live transport | Local process ports plus provider/harness credentials | Transport-mux carries real agent-core streams and agent-mux-launched external harness traffic through proxy routes with redacted launch/env/metrics artifacts | Scheduled/manual first, then `publish.yml` after quarantine |\n\nModel-backed tests must be opt-in by environment detection. A missing credential should mark the lane skipped or not scheduled, not silently pass a test that claims provider coverage.\n\n## Transport-Mux Lane Split\n\nTransport-mux scenarios must be split across both lanes instead of hidden inside broad mux smoke tests.\n\n| Scenario | Lane | Why it belongs there |\n| --- | --- | --- |\n| Route/codec matrix for supported transports | No-model | A fixture completion engine can prove request parsing, response envelopes, streaming shape, auth errors, invalid JSON, and token-count behavior deterministically |\n| Runtime lifecycle and env injection | No-model | Local ports and redacted env diffs do not require provider credentials |\n| Agent-mux launch proxy decision | No-model | `resolveLaunchPlan` can prove proxy forced/if-needed/native/forbidden behavior with fixture provider configs |\n| Agent-core through transport-mux | Both | Fixture stream belongs in no-model; live provider stream belongs in model-backed when credentials exist |\n| External harness through agent-mux proxy | Model-backed | Only a real harness plus provider credential can prove the harness actually consumes the proxy env and completes a sentinel stream |\n| Passthrough upstream bridge | No-model first, model-backed optional | Path/query/auth/error mapping is deterministic with a fixture upstream; live passthrough only adds value for provider-specific drift |\n\n## Required Labels\n\nEvery test file or workflow job should map to one of these labels:\n\n- `lane:no-model`\n- `lane:model-backed`\n- `scope:unit`\n- `scope:contract`\n- `scope:integration`\n- `scope:e2e`\n- `scope:release-gate`\n\nThese labels can start as workflow/job names and test descriptions. They only need to become machine-readable once the first implementation slice adds the new runners.\n\n## Lane Ownership\n\n| Lane | Primary owner | Required reviewer | Failure triage clock |\n| --- | --- | --- | --- |\n| No-model package and contract tests | Owning package maintainer | Adjacent package maintainer when a boundary contract changes | Same business day for PR failures |\n| No-model UI and CLI smoke | Surface owner | Runtime maintainer when session behavior changes | Same business day for PR failures |\n| Model-backed harness smoke | Harness maintainer | CI maintainer for secret and runner changes | Next business day for scheduled failures; immediate for staging/release failures |\n| Model-backed runtime smoke | Runtime maintainer | Harness and mux maintainers | Immediate for staging/release failures |\n| Coverage/reporting | CI maintainer | Package owner when thresholds change | Same business day for blocking report failures |\n\n## Admission Criteria\n\nA test may enter the no-model lane when it has deterministic fixtures, no provider credentials, bounded runtime, and a package owner.\n\nA test may enter the model-backed lane when it has explicit credential gates, redacted artifacts, a live behavior that mocks cannot prove, a retry policy, and an owner for provider-specific failures.\n\n## Promotion Path\n\n1. Local/package command.\n2. PR/push no-model lane.\n3. Scheduled model-backed lane, if provider behavior matters.\n4. Staging preflight only if it protects publish correctness.\n5. Release preflight only if missing the test can publish a broken production artifact.\n",
    "documents": []
  },
  "outgoingEdges": [],
  "incomingEdges": [
    {
      "from": "page:docs-testing",
      "to": "page:docs-testing-test-lanes",
      "kind": "contains_page"
    }
  ]
}

Test Lanes json

Inspect the normalized record payload exactly as the atlas UI reads it.

File · wiki/docs/testing/test-lanes.mdCluster · wiki

Record JSON

{
  "id": "page:docs-testing-test-lanes",
  "_kind": "Page",
  "_file": "wiki/docs/testing/test-lanes.md",
  "_cluster": "wiki",
  "attributes": {
    "nodeKind": "Page",
    "sourcePath": "docs/testing/test-lanes.md",
    "sourceKind": "repo-docs",
    "title": "Test Lanes",
    "displayName": "Test Lanes",
    "slug": "docs/testing/test-lanes",
    "articlePath": "wiki/docs/testing/test-lanes.md",
    "article": "\n# Test Lanes\n\nThe replacement strategy has two top-level lanes. Every new test must declare which lane it belongs to before it is added to CI.\n\n## Lane 1: No-Model Tests\n\nNo-model tests must run without provider secrets, paid model calls, or installed external agent CLIs beyond normal package dependencies.\n\n| Scope | Primary tools | What it covers | CI timing |\n| --- | --- | --- | --- |\n| Package unit tests | Vitest | Pure functions, schema parsing, protocol serialization, command helpers, state machines | Every PR and push |\n| Contract tests | Vitest + fixtures | Stable boundaries between SDK, hooks-mux, agent-mux, transport-mux, agent-core, and babysitter-agent, including transport-mux route matrix and runtime env injection contracts | Every PR and push |\n| Mock harness tests | Vitest + existing mock adapters | Session lifecycle, adapter dispatch, tool-call translation, stop-hook semantics, plugin discovery, fallback metadata | Every PR and push |\n| Browser/UI E2E | Playwright + mock gateway | Agent-mux WebUI session flows, transcript rendering, model picker behavior, approvals, reconnect behavior | PRs touching WebUI/gateway/session code; staging before publish |\n| CLI smoke tests | Node subprocess tests | `babysitter`, `amux`, hooks-mux CLI, package entrypoints, help output, dry-run paths | Every PR for touched packages; staging before publish |\n| Docs and generated assets | Existing docs QA and generator checks | Documentation links, snippets, generated plugin bundles, command templates | Every PR and push |\n\nNo-model tests should prefer deterministic fixture transcripts and mock harness implementations. They should never skip because an API key is missing; if a test cannot run without a provider key, it belongs in the model-backed lane.\n\n### Implemented No-Model Matrices\n\n`Publish` owns the `no_model_mock_matrix` job as a stack E2E matrix, not as a package-suite aggregator. The matrix dimensions are declared in `.github/workflows/publish.yml` and the test consumes exactly one selected lane:\n\n| Dimension | Values | Required proof |\n| --- | --- | --- |\n| Agent runtime | `agent-mux-mocks`, `real-agent` | The lane installs/verifies the target through `amux install --dry-run`, then launches the agent path selected by the runtime dimension |\n| Agent | `claude`, `codex`, `pi`, `gemini` | The target CLI path is selected by agent-mux and produces per-agent evidence |\n| Hook mode | `none`, `hooks-mux` | `hooks-mux` lanes register an `amux hooks` command bridge and assert the normalized hooks-mux phase evidence |\n\nEvery no-model stack lane starts a local transport-mux runtime with a fixture completion engine. The launched agent, including the local CI shim for real-agent lanes and the mock-harness path for `agent-mux-mocks`, must send a request through that transport-mux runtime and attach the request count plus redacted evidence under `publish-no-model-stack-*`.\n\n`Publish` also has an `agent_mux_hooks_mux_e2e` no-model/no-SDK job. It is intentionally separate from the live Babysitter plugin matrix: the GitHub matrix chooses `claude-code`, `codex`, and `pi`; the test only consumes that one selected lane, registers an `amux hooks` command hook, bridges the native payload into `a5c-hooks-mux invoke`, and asserts the hooks-mux normalized phase evidence.\n\n## Lane 2: Model-Backed Tests\n\nModel-backed tests exercise real provider integrations, real installed harnesses, and real credentials.\n\n| Scope | Required setup | What it covers | CI timing |\n| --- | --- | --- | --- |\n| SDK harness/plugin setup smoke | `babysitter harness:install <name>` and `babysitter harness:install-plugin <name>` | Installer delegation, plugin target resolution, idempotent manifests; not babysitter-agent runtime | Scheduled, manual, staging gate |\n| Agent-mux plugin/session E2E | Provider secrets, installed external CLI, and plugin precondition where supported | `amux run` or `createClient().run` starts a session, plugin command creates a Babysitter run, and hooks/process lifecycle are asserted | Scheduled, manual, staging gate |\n| Babysitter-agent live orchestration | Preinstalled/mocked backend plus `OPENAI_API_KEY`, configured Foundry/OpenAI credentials, or configured cloud equivalents where needed | `@a5c-ai/babysitter-agent` can plan, execute, post task results, and close a run without executing harness installer commands | `publish.yml` staging/main preflight, manual |\n| Agent-mux live adapters | Provider-specific credentials | Claude Code and Codex adapters produce protocol events that match the mux contracts | Scheduled/manual first, then `publish.yml` release preflight after quarantine |\n| Transport-mux live transport | Local process ports plus provider/harness credentials | Transport-mux carries real agent-core streams and agent-mux-launched external harness traffic through proxy routes with redacted launch/env/metrics artifacts | Scheduled/manual first, then `publish.yml` after quarantine |\n\nModel-backed tests must be opt-in by environment detection. A missing credential should mark the lane skipped or not scheduled, not silently pass a test that claims provider coverage.\n\n## Transport-Mux Lane Split\n\nTransport-mux scenarios must be split across both lanes instead of hidden inside broad mux smoke tests.\n\n| Scenario | Lane | Why it belongs there |\n| --- | --- | --- |\n| Route/codec matrix for supported transports | No-model | A fixture completion engine can prove request parsing, response envelopes, streaming shape, auth errors, invalid JSON, and token-count behavior deterministically |\n| Runtime lifecycle and env injection | No-model | Local ports and redacted env diffs do not require provider credentials |\n| Agent-mux launch proxy decision | No-model | `resolveLaunchPlan` can prove proxy forced/if-needed/native/forbidden behavior with fixture provider configs |\n| Agent-core through transport-mux | Both | Fixture stream belongs in no-model; live provider stream belongs in model-backed when credentials exist |\n| External harness through agent-mux proxy | Model-backed | Only a real harness plus provider credential can prove the harness actually consumes the proxy env and completes a sentinel stream |\n| Passthrough upstream bridge | No-model first, model-backed optional | Path/query/auth/error mapping is deterministic with a fixture upstream; live passthrough only adds value for provider-specific drift |\n\n## Required Labels\n\nEvery test file or workflow job should map to one of these labels:\n\n- `lane:no-model`\n- `lane:model-backed`\n- `scope:unit`\n- `scope:contract`\n- `scope:integration`\n- `scope:e2e`\n- `scope:release-gate`\n\nThese labels can start as workflow/job names and test descriptions. They only need to become machine-readable once the first implementation slice adds the new runners.\n\n## Lane Ownership\n\n| Lane | Primary owner | Required reviewer | Failure triage clock |\n| --- | --- | --- | --- |\n| No-model package and contract tests | Owning package maintainer | Adjacent package maintainer when a boundary contract changes | Same business day for PR failures |\n| No-model UI and CLI smoke | Surface owner | Runtime maintainer when session behavior changes | Same business day for PR failures |\n| Model-backed harness smoke | Harness maintainer | CI maintainer for secret and runner changes | Next business day for scheduled failures; immediate for staging/release failures |\n| Model-backed runtime smoke | Runtime maintainer | Harness and mux maintainers | Immediate for staging/release failures |\n| Coverage/reporting | CI maintainer | Package owner when thresholds change | Same business day for blocking report failures |\n\n## Admission Criteria\n\nA test may enter the no-model lane when it has deterministic fixtures, no provider credentials, bounded runtime, and a package owner.\n\nA test may enter the model-backed lane when it has explicit credential gates, redacted artifacts, a live behavior that mocks cannot prove, a retry policy, and an owner for provider-specific failures.\n\n## Promotion Path\n\n1. Local/package command.\n2. PR/push no-model lane.\n3. Scheduled model-backed lane, if provider behavior matters.\n4. Staging preflight only if it protects publish correctness.\n5. Release preflight only if missing the test can publish a broken production artifact.\n",
    "documents": []
  },
  "outgoingEdges": [],
  "incomingEdges": [
    {
      "from": "page:docs-testing",
      "to": "page:docs-testing-test-lanes",
      "kind": "contains_page"
    }
  ]
}