Agentic AI Atlas

II.

Page JSON

page:docs-testing

Structured · live

Testing Strategy json

Inspect the normalized record payload exactly as the atlas UI reads it.

File · wiki/docs/testing.mdCluster · wiki

Record JSON

{
  "id": "page:docs-testing",
  "_kind": "Page",
  "_file": "wiki/docs/testing.md",
  "_cluster": "wiki",
  "attributes": {
    "nodeKind": "Page",
    "sourcePath": "docs/testing/index.md",
    "sourceKind": "repo-docs",
    "title": "Testing Strategy",
    "displayName": "Testing Strategy",
    "slug": "docs/testing",
    "articlePath": "wiki/docs/testing/index.md",
    "article": "\n# Testing Strategy\n\nThis directory defines the replacement testing strategy after the legacy Docker and Docker-E2E workflows were removed. The current CI implementation lives primarily in `.github/workflows/publish.yml`, with GitHub Actions owning the live-stack scenario and OS matrix. The new plan starts from repository-native package boundaries, Babysitter harness setup commands, the `babysitter-agent` runtime surface, and explicit model/no-model lanes instead of reusing the retired Docker image and `e2e-tests/docker` suite.\n\n## Documents\n\n- [Test Lanes](./test-lanes.md) defines the two top-level lanes: no-model deterministic tests and model-backed tests that require real provider credentials.\n- [Harness And Plugin E2E](./harness-e2e.md) separates SDK harness/plugin setup from agent-mux plugin/session E2E.\n- [Agent Mux And Runtime E2E](./agent-mux-and-runtime-e2e.md) defines runtime coverage for `agent-mux`, `transport-mux`, `agent-core`, and `@a5c-ai/babysitter-agent` flows after setup preconditions are satisfied.\n- [Pipeline Integration](./pipeline-integration.md) defines where each lane belongs in CI, staging, release, scheduled, and manual workflows.\n- [Coverage And Reporting](./coverage-and-reporting.md) defines repo-wide coverage reporting, artifacts, logs, and pass/fail evidence.\n- [Implementation Roadmap](./implementation-roadmap.md) defines rollout slices, exit criteria, and stop conditions.\n- [Current Test Command Inventory](./current-test-command-inventory.md) maps existing package test-like commands to lane, scope, owner, artifact name, and pipeline placement for roadmap slice 0.\n- [Mock And Fixture Contracts](./mock-and-fixture-contracts.md) defines deterministic fixture families and live/mock compatibility rules.\n- [Quality Gates](./quality-gates.md) defines release-evidence gates and adversarial review criteria.\n- [Stack Permutations](./stack-permutations.md) defines valid and invalid layer combinations across the modular stack.\n- [Primary Flow Data Paths](./primary-flow-data-paths.md) maps the full data path for the main agent-mux, babysitter-agent, SDK run, hooks-mux, and transport-mux flows.\n- [Trace Identifiers And Evidence](./trace-identifiers-and-evidence.md) defines the IDs, logs, files, and artifact bundles required to correlate those flows.\n\n## Principles\n\n- Separate tests that need model credentials from tests that can run with mocks, fixtures, or local fakes.\n- Make setup explicit and repeatable, but do not conflate setup with runtime: SDK harness/plugin setup, agent-mux plugin/session E2E, and babysitter-agent runtime E2E are separate paths.\n- Test mux boundaries at multiple scopes: protocol contracts, adapter translation, transport behavior, gateway/session behavior, UI behavior, and full runtime orchestration.\n- Prefer package-local tests for fast feedback, then compose them into broader lanes only when the integration surface matters.\n- Treat live model runs as release evidence, not as the first line of feedback for every pull request.\n- Promote tests through explicit gates: manual, scheduled, staging preflight, then release preflight.\n- Require each model-backed claim to have a no-model fixture or contract counterpart unless the behavior is inherently provider-only.\n\n## Status Legend\n\n| Status | Meaning |\n| --- | --- |\n| Current | Command, workflow, or package test exists today and can be validated now. |\n| Proposed | Contract name or workflow shape this strategy recommends for a future implementation slice; not the current source of truth unless a current workflow or package script is named. |\n| Promotion target | A test exists or is planned in a lower lane and should move only after meeting quality gates. |\n\nUnless a document explicitly says Current, command bundles and workflow names are proposed implementation targets.\n\n## Current State\n\nThe repository already has Vitest, Playwright, package-local test scripts, release verification scripts, docs QA, metadata checks, architecture gates, and staging/release workflows. This strategy names how to organize the next E2E generation around those surfaces rather than around the removed Docker workflows.\n\n## Requested Scope Traceability\n\n| Requested scope | Primary docs | Lane | First implementation surface |\n| --- | --- | --- | --- |\n| Codex E2E | [Harness And Plugin E2E](./harness-e2e.md), [Stack Permutations](./stack-permutations.md) | No-model setup/session first, then capability-gated model-backed | Harness setup smoke, Codex adapter protocol fixture, plugin E2E only after capability proof; babysitter-agent runtime is separate |\n| Claude Code E2E | [Harness And Plugin E2E](./harness-e2e.md), [Stack Permutations](./stack-permutations.md) | No-model setup/session first, then model-backed | Harness setup smoke, agent-mux session, plugin-manager where supported, `/babysitter:call` plugin smoke, Claude hook/tool-call fixture |\n| `harness:install` and plugin setup | [Harness And Plugin E2E](./harness-e2e.md), [Stack Permutations](./stack-permutations.md) | Setup only | Dry-run install JSON, plugin discovery JSON, idempotency checks; no babysitter-agent runtime claim |\n| Agent-mux functionality requiring credentials | [Agent Mux And Runtime E2E](./agent-mux-and-runtime-e2e.md), [Pipeline Integration](./pipeline-integration.md) | Model-backed | Live adapter matrix for Codex and Claude Code |\n| Babysitter-agent whole-system flow | [Agent Mux And Runtime E2E](./agent-mux-and-runtime-e2e.md), [Stack Permutations](./stack-permutations.md) | Both | Mock planner/executor first, bounded live process after staging promotion, no installer commands inside runtime E2E |\n| Muxes and transport-mux | [Agent Mux And Runtime E2E](./agent-mux-and-runtime-e2e.md), [Mock And Fixture Contracts](./mock-and-fixture-contracts.md), [Primary Flow Data Paths](./primary-flow-data-paths.md) | Both | Shared event fixtures, transport roundtrip, live transport smoke with trace identifiers |\n| Hooks muxes | [Agent Mux And Runtime E2E](./agent-mux-and-runtime-e2e.md), [Mock And Fixture Contracts](./mock-and-fixture-contracts.md), [Trace Identifiers And Evidence](./trace-identifiers-and-evidence.md) | Both | Normalized hook fixtures, live hook replay after redaction with session/run correlation |\n| Pipeline integration | [Pipeline Integration](./pipeline-integration.md), [Implementation Roadmap](./implementation-roadmap.md) | Both | New workflow contracts and staged required checks |\n| Coverage reporting | [Coverage And Reporting](./coverage-and-reporting.md) | Both | Package coverage baselines plus scenario coverage summaries |\n",
    "documents": []
  },
  "outgoingEdges": [
    {
      "from": "page:docs-testing",
      "to": "page:docs-testing-agent-mux-and-runtime-e2e",
      "kind": "contains_page"
    },
    {
      "from": "page:docs-testing",
      "to": "page:docs-testing-coverage-and-reporting",
      "kind": "contains_page"
    },
    {
      "from": "page:docs-testing",
      "to": "page:docs-testing-current-test-command-inventory",
      "kind": "contains_page"
    },
    {
      "from": "page:docs-testing",
      "to": "page:docs-testing-harness-e2e",
      "kind": "contains_page"
    },
    {
      "from": "page:docs-testing",
      "to": "page:docs-testing-implementation-roadmap",
      "kind": "contains_page"
    },
    {
      "from": "page:docs-testing",
      "to": "page:docs-testing-mock-and-fixture-contracts",
      "kind": "contains_page"
    },
    {
      "from": "page:docs-testing",
      "to": "page:docs-testing-pipeline-integration",
      "kind": "contains_page"
    },
    {
      "from": "page:docs-testing",
      "to": "page:docs-testing-primary-flow-data-paths",
      "kind": "contains_page"
    },
    {
      "from": "page:docs-testing",
      "to": "page:docs-testing-quality-gates",
      "kind": "contains_page"
    },
    {
      "from": "page:docs-testing",
      "to": "page:docs-testing-stack-permutations",
      "kind": "contains_page"
    },
    {
      "from": "page:docs-testing",
      "to": "page:docs-testing-test-lanes",
      "kind": "contains_page"
    },
    {
      "from": "page:docs-testing",
      "to": "page:docs-testing-trace-identifiers-and-evidence",
      "kind": "contains_page"
    }
  ],
  "incomingEdges": [
    {
      "from": "page:docs",
      "to": "page:docs-testing",
      "kind": "contains_page"
    }
  ]
}

Testing Strategy json

Inspect the normalized record payload exactly as the atlas UI reads it.

File · wiki/docs/testing.mdCluster · wiki

Record JSON

{
  "id": "page:docs-testing",
  "_kind": "Page",
  "_file": "wiki/docs/testing.md",
  "_cluster": "wiki",
  "attributes": {
    "nodeKind": "Page",
    "sourcePath": "docs/testing/index.md",
    "sourceKind": "repo-docs",
    "title": "Testing Strategy",
    "displayName": "Testing Strategy",
    "slug": "docs/testing",
    "articlePath": "wiki/docs/testing/index.md",
    "article": "\n# Testing Strategy\n\nThis directory defines the replacement testing strategy after the legacy Docker and Docker-E2E workflows were removed. The current CI implementation lives primarily in `.github/workflows/publish.yml`, with GitHub Actions owning the live-stack scenario and OS matrix. The new plan starts from repository-native package boundaries, Babysitter harness setup commands, the `babysitter-agent` runtime surface, and explicit model/no-model lanes instead of reusing the retired Docker image and `e2e-tests/docker` suite.\n\n## Documents\n\n- [Test Lanes](./test-lanes.md) defines the two top-level lanes: no-model deterministic tests and model-backed tests that require real provider credentials.\n- [Harness And Plugin E2E](./harness-e2e.md) separates SDK harness/plugin setup from agent-mux plugin/session E2E.\n- [Agent Mux And Runtime E2E](./agent-mux-and-runtime-e2e.md) defines runtime coverage for `agent-mux`, `transport-mux`, `agent-core`, and `@a5c-ai/babysitter-agent` flows after setup preconditions are satisfied.\n- [Pipeline Integration](./pipeline-integration.md) defines where each lane belongs in CI, staging, release, scheduled, and manual workflows.\n- [Coverage And Reporting](./coverage-and-reporting.md) defines repo-wide coverage reporting, artifacts, logs, and pass/fail evidence.\n- [Implementation Roadmap](./implementation-roadmap.md) defines rollout slices, exit criteria, and stop conditions.\n- [Current Test Command Inventory](./current-test-command-inventory.md) maps existing package test-like commands to lane, scope, owner, artifact name, and pipeline placement for roadmap slice 0.\n- [Mock And Fixture Contracts](./mock-and-fixture-contracts.md) defines deterministic fixture families and live/mock compatibility rules.\n- [Quality Gates](./quality-gates.md) defines release-evidence gates and adversarial review criteria.\n- [Stack Permutations](./stack-permutations.md) defines valid and invalid layer combinations across the modular stack.\n- [Primary Flow Data Paths](./primary-flow-data-paths.md) maps the full data path for the main agent-mux, babysitter-agent, SDK run, hooks-mux, and transport-mux flows.\n- [Trace Identifiers And Evidence](./trace-identifiers-and-evidence.md) defines the IDs, logs, files, and artifact bundles required to correlate those flows.\n\n## Principles\n\n- Separate tests that need model credentials from tests that can run with mocks, fixtures, or local fakes.\n- Make setup explicit and repeatable, but do not conflate setup with runtime: SDK harness/plugin setup, agent-mux plugin/session E2E, and babysitter-agent runtime E2E are separate paths.\n- Test mux boundaries at multiple scopes: protocol contracts, adapter translation, transport behavior, gateway/session behavior, UI behavior, and full runtime orchestration.\n- Prefer package-local tests for fast feedback, then compose them into broader lanes only when the integration surface matters.\n- Treat live model runs as release evidence, not as the first line of feedback for every pull request.\n- Promote tests through explicit gates: manual, scheduled, staging preflight, then release preflight.\n- Require each model-backed claim to have a no-model fixture or contract counterpart unless the behavior is inherently provider-only.\n\n## Status Legend\n\n| Status | Meaning |\n| --- | --- |\n| Current | Command, workflow, or package test exists today and can be validated now. |\n| Proposed | Contract name or workflow shape this strategy recommends for a future implementation slice; not the current source of truth unless a current workflow or package script is named. |\n| Promotion target | A test exists or is planned in a lower lane and should move only after meeting quality gates. |\n\nUnless a document explicitly says Current, command bundles and workflow names are proposed implementation targets.\n\n## Current State\n\nThe repository already has Vitest, Playwright, package-local test scripts, release verification scripts, docs QA, metadata checks, architecture gates, and staging/release workflows. This strategy names how to organize the next E2E generation around those surfaces rather than around the removed Docker workflows.\n\n## Requested Scope Traceability\n\n| Requested scope | Primary docs | Lane | First implementation surface |\n| --- | --- | --- | --- |\n| Codex E2E | [Harness And Plugin E2E](./harness-e2e.md), [Stack Permutations](./stack-permutations.md) | No-model setup/session first, then capability-gated model-backed | Harness setup smoke, Codex adapter protocol fixture, plugin E2E only after capability proof; babysitter-agent runtime is separate |\n| Claude Code E2E | [Harness And Plugin E2E](./harness-e2e.md), [Stack Permutations](./stack-permutations.md) | No-model setup/session first, then model-backed | Harness setup smoke, agent-mux session, plugin-manager where supported, `/babysitter:call` plugin smoke, Claude hook/tool-call fixture |\n| `harness:install` and plugin setup | [Harness And Plugin E2E](./harness-e2e.md), [Stack Permutations](./stack-permutations.md) | Setup only | Dry-run install JSON, plugin discovery JSON, idempotency checks; no babysitter-agent runtime claim |\n| Agent-mux functionality requiring credentials | [Agent Mux And Runtime E2E](./agent-mux-and-runtime-e2e.md), [Pipeline Integration](./pipeline-integration.md) | Model-backed | Live adapter matrix for Codex and Claude Code |\n| Babysitter-agent whole-system flow | [Agent Mux And Runtime E2E](./agent-mux-and-runtime-e2e.md), [Stack Permutations](./stack-permutations.md) | Both | Mock planner/executor first, bounded live process after staging promotion, no installer commands inside runtime E2E |\n| Muxes and transport-mux | [Agent Mux And Runtime E2E](./agent-mux-and-runtime-e2e.md), [Mock And Fixture Contracts](./mock-and-fixture-contracts.md), [Primary Flow Data Paths](./primary-flow-data-paths.md) | Both | Shared event fixtures, transport roundtrip, live transport smoke with trace identifiers |\n| Hooks muxes | [Agent Mux And Runtime E2E](./agent-mux-and-runtime-e2e.md), [Mock And Fixture Contracts](./mock-and-fixture-contracts.md), [Trace Identifiers And Evidence](./trace-identifiers-and-evidence.md) | Both | Normalized hook fixtures, live hook replay after redaction with session/run correlation |\n| Pipeline integration | [Pipeline Integration](./pipeline-integration.md), [Implementation Roadmap](./implementation-roadmap.md) | Both | New workflow contracts and staged required checks |\n| Coverage reporting | [Coverage And Reporting](./coverage-and-reporting.md) | Both | Package coverage baselines plus scenario coverage summaries |\n",
    "documents": []
  },
  "outgoingEdges": [
    {
      "from": "page:docs-testing",
      "to": "page:docs-testing-agent-mux-and-runtime-e2e",
      "kind": "contains_page"
    },
    {
      "from": "page:docs-testing",
      "to": "page:docs-testing-coverage-and-reporting",
      "kind": "contains_page"
    },
    {
      "from": "page:docs-testing",
      "to": "page:docs-testing-current-test-command-inventory",
      "kind": "contains_page"
    },
    {
      "from": "page:docs-testing",
      "to": "page:docs-testing-harness-e2e",
      "kind": "contains_page"
    },
    {
      "from": "page:docs-testing",
      "to": "page:docs-testing-implementation-roadmap",
      "kind": "contains_page"
    },
    {
      "from": "page:docs-testing",
      "to": "page:docs-testing-mock-and-fixture-contracts",
      "kind": "contains_page"
    },
    {
      "from": "page:docs-testing",
      "to": "page:docs-testing-pipeline-integration",
      "kind": "contains_page"
    },
    {
      "from": "page:docs-testing",
      "to": "page:docs-testing-primary-flow-data-paths",
      "kind": "contains_page"
    },
    {
      "from": "page:docs-testing",
      "to": "page:docs-testing-quality-gates",
      "kind": "contains_page"
    },
    {
      "from": "page:docs-testing",
      "to": "page:docs-testing-stack-permutations",
      "kind": "contains_page"
    },
    {
      "from": "page:docs-testing",
      "to": "page:docs-testing-test-lanes",
      "kind": "contains_page"
    },
    {
      "from": "page:docs-testing",
      "to": "page:docs-testing-trace-identifiers-and-evidence",
      "kind": "contains_page"
    }
  ],
  "incomingEdges": [
    {
      "from": "page:docs",
      "to": "page:docs-testing",
      "kind": "contains_page"
    }
  ]
}