Agentic AI Atlas

II.

Page JSON

page:docs-testing-pipeline-integration

Structured · live

Pipeline Integration json

Inspect the normalized record payload exactly as the atlas UI reads it.

File · wiki/docs/testing/pipeline-integration.mdCluster · wiki

Record JSON

{
  "id": "page:docs-testing-pipeline-integration",
  "_kind": "Page",
  "_file": "wiki/docs/testing/pipeline-integration.md",
  "_cluster": "wiki",
  "attributes": {
    "nodeKind": "Page",
    "sourcePath": "docs/testing/pipeline-integration.md",
    "sourceKind": "repo-docs",
    "title": "Pipeline Integration",
    "displayName": "Pipeline Integration",
    "slug": "docs/testing/pipeline-integration",
    "articlePath": "wiki/docs/testing/pipeline-integration.md",
    "article": "\n# Pipeline Integration\n\nThe pipeline should add new testing lanes in stages. No-model tests protect every pull request. Model-backed tests protect promotion and release confidence without making ordinary PRs depend on provider availability.\n\n## Workflow Placement\n\n## Current Implementation\n\nThe current implementation is consolidated in `.github/workflows/publish.yml`. That workflow owns the live-stack scenario and OS matrix directly under `live_stack_e2e`, exports each selected scenario through `LIVE_STACK_*` environment variables, runs `npm run test:e2e:live-stack:pipeline`, and writes the per-scenario coverage artifact with `npm run coverage:e2e:live-stack`. Test code executes exactly one pipeline-selected scenario when `LIVE_STACK_REQUIRE_EVIDENCE=1`; it must not enumerate the scenario matrix or run a code-side matrix runner.\n\n`Publish` now also owns two deterministic matrices before live-stack publish preflight. `agent_mux_hooks_mux_e2e` covers the no-Babysitter-SDK path `agent-mux hooks -> hooks-mux invoke` for `claude-code`, `codex`, and `pi`; the matrix supplies agent, adapter, hook event, payload, and expected canonical phase. `no_model_mock_matrix` is a stack E2E matrix: GitHub chooses the runtime (`agent-mux-mocks` or local real-agent CLI shim), agent (`claude`, `codex`, `pi`, `gemini`), and hook mode (`none` or `hooks-mux`); the test installs/verifies the agent path, launches it with an agent-mux profile, routes the model call through a local transport-mux mock model, and optionally proves hooks-mux normalization. Both jobs are dependencies of `Prepare Publish`, so package publish/deploy cannot start until these matrices pass.\n\n`Publish` now also owns the branch-aware publish/deploy topology for `develop`, `staging`, and `main`: validation and live-stack jobs precede `Prepare Publish`; package publishes, docs deploy, Atlas WebUI deploy, cloud deploy, release tagging, and external plugin sync depend on that prepared publish ref/version.\n\n\n| Workflow phase | Lanes | Trigger | Required behavior |\n| --- | --- | --- | --- |\n| Pull request / push CI | No-model unit, contract, mock integration, docs QA | Every PR and branch push | Fast, deterministic, no secrets, no live providers |\n| Publish preflight | Full no-model suite plus selected model-backed smoke | Push to `develop`, `staging`, or `main` before publish/deploy jobs | Blocks publish/deploy if runtime or harness smoke fails |\n| Release preflight | Full no-model suite plus model-backed release smoke | Push to `main` before publish/release jobs | Blocks production publish if live Codex/Claude/runtime smoke fails |\n| Scheduled nightly | Full model-backed suite | Nightly or twice daily | Detects provider, harness, CLI, and auth drift outside code changes |\n| Manual diagnostics | Any single lane or provider | `workflow_dispatch` | Lets maintainers rerun one harness/provider without re-running the full matrix |\n\n## Recommended New Workflows\n\nDo not resurrect the retired Docker workflow names. Use new workflow names that describe the new strategy:\n\n- `publish.yml` currently runs deterministic validation and model-backed live-stack coverage inline.\n- Optional future `testing-no-model.yml` can extract deterministic PR/push coverage if another workflow needs the same contract.\n- Optional future `testing-model-backed.yml` can extract scheduled/manual model-backed coverage if it should run independently from publish.\n- Optional future `testing-coverage-report.yml` can extract repository-wide coverage aggregation if coverage becomes too expensive for the default CI workflow.\n\nReusable workflows are optional extraction targets, not the current source of truth. Existing `.github/workflows/ci.yml` can keep fast PR checks, while `.github/workflows/publish.yml` owns publish-time validation, live-stack preflight, deploy, tagging, and plugin sync.\n\n## Secret Gating\n\nModel-backed jobs must use explicit `if:` guards before setup:\n\n| Provider or harness | Required signals |\n| --- | --- |\n| Codex | OpenAI credential configured for CI and Codex runtime install available |\n| Claude Code | Foundry/OpenAI credential configured for CI, Claude Code runtime install available, and transport-mux proxy path enabled |\n| Agent-core provider | Backend-specific credential and selected backend metadata |\n| Cloud/provider variants | Environment-specific credentials, region/project metadata, and rate-limit budget |\n\nA skipped model-backed job should say which credential or capability was missing. A required staging/release model-backed job should fail if the job was selected but setup cannot satisfy the declared dependency.\n\n## Suggested Dependency Shape\n\nStaging and release should be ordered like this:\n\n1. Build and no-model tests.\n2. Package and generated artifact checks.\n3. Model-backed runtime smoke, transport-mux bridge smoke, and capability-gated plugin/session smoke.\n4. Publish or deploy jobs.\n5. Post-publish verification or external sync jobs.\n\nThis keeps publish jobs behind live runtime proof without forcing every PR to spend model budget.\n\n## Artifact Policy\n\nEvery E2E job should upload:\n\n- command transcript,\n- redacted harness discovery JSON,\n- redacted event logs,\n- transport-mux launch-plan JSON when proxy launch is under test,\n- redacted proxy config and env injection diff,\n- route transcripts, streaming event transcripts, metrics snapshots, and cache stats for transport-mux lanes,\n- run IDs and session IDs,\n- coverage output when collected,\n- provider/harness version metadata,\n- skip reason if the job did not run.\n\nArtifacts must never include raw API keys, token files, home-directory credentials, or full provider request payloads when those payloads may contain secrets.\n\n## Reusable Workflow Contracts\n\n| Workflow | Inputs | Outputs | Required artifacts | Downstream consumers |\n| --- | --- | --- | --- | --- |\n| Optional `testing-no-model.yml` | `scope`, `changed_packages`, `coverage_mode` | `no_model_status`, `coverage_artifact`, `junit_artifact` | Vitest logs, Playwright traces on failure, package coverage summaries | Future extraction for `ci.yml` and `publish.yml` |\n| Optional `testing-model-backed.yml` | `provider`, `agent`, `backend`, `path`, `prompt_fixture`, `required` | `model_backed_status`, `skip_reason`, `run_artifact` | Separate artifacts per path: setup JSON, agent-mux session events, transport-mux launch/env/metrics evidence, babysitter-agent run proof, stop-hook evidence | Future extraction from `publish.yml` live-stack jobs or scheduled workflow |\n| Optional `testing-coverage-report.yml` | `coverage_artifacts`, `playwright_artifacts`, `model_backed_artifacts` | `coverage_summary`, `scenario_summary` | Merged markdown summary, raw coverage JSON, trace index | Future PR summaries and release candidate notes |\n\nRequired workflows should expose explicit failure/skip outputs. A publish workflow must depend on `*_status == success`; a scheduled workflow may record `skip_reason` without failing when credentials are intentionally absent.\n\n## Required Check Names\n\nStable required-check names prevent branch protection churn:\n\n- `testing / no-model contracts`\n- `testing / no-model runtime`\n- `testing / no-model transport-mux`\n- `testing / no-model ui`\n- `testing / model-backed codex`\n- `testing / model-backed claude-code`\n- `testing / model-backed babysitter-agent`\n- `testing / model-backed transport-mux bridge`\n- `testing / coverage summary`\n\nOnly no-model checks should be required for ordinary PRs at first. Model-backed checks should become required only on `staging` and release branches after their quarantine period ends.\n\n## Current Inventory Naming\n\nRoadmap slice 0 keeps current workflow behavior intact and uses [Current Test Command Inventory](./current-test-command-inventory.md) as the source of truth for existing package scripts. Workflow comments and future reusable jobs should use the inventory artifact names before they introduce new command bundles.\n\n## Proposed Command Bundles\n\nStatus: Mixed. `test:e2e:live-stack:*` and `coverage:e2e:live-stack` are current scripts; the broader no-model/model-backed bundle names remain proposed until a follow-up slice adds them.\n\nPackage owners may initially wire these bundles as workflow steps that call existing package-local scripts, then promote them into root `package.json` scripts when at least two packages share the lane.\n\n| Proposed command | Lane | Contents |\n| --- | --- | --- |\n| `npm run test:no-model` | No-model | Package unit, contract, mock harness, CLI smoke, docs/generator checks |\n| `npm run test:no-model:mux` | No-model | Agent-mux, transport-mux route/runtime/env/launch-plan, hooks-mux, gateway, and fixture compatibility checks |\n| `npm run test:no-model:harness-setup` | No-model | `harness:list`, install dry-runs, plugin install dry-runs, discovery fixtures |\n| `npm run test:model-backed` | Model-backed | All selected live provider/harness tests with credential gates |\n| `npm run test:model-backed:agent-mux-plugin` | Model-backed | Capability-gated `amux run` plugin/session tests with Babysitter plugin preconditions |\n| `npm run test:model-backed:runtime` | Model-backed | Agent-core, transport-mux bridge, agent-mux session smoke, and babysitter-agent runtime smoke; babysitter-agent jobs do not run installers |\n| `npm run test:model-backed:transport-mux` | Model-backed | Agent-core stream through transport-mux plus agent-mux-launched external harness proxy smoke with credential gates |\n| `npm run coverage:repo` | No-model plus reports | Merge package coverage and scenario summaries into one artifact |\n\nInitial workflow implementation can call package-local commands directly. These bundle names become useful once at least two packages share a lane.\n",
    "documents": []
  },
  "outgoingEdges": [],
  "incomingEdges": [
    {
      "from": "page:docs-testing",
      "to": "page:docs-testing-pipeline-integration",
      "kind": "contains_page"
    }
  ]
}

Pipeline Integration json

Inspect the normalized record payload exactly as the atlas UI reads it.

File · wiki/docs/testing/pipeline-integration.mdCluster · wiki

Record JSON

{
  "id": "page:docs-testing-pipeline-integration",
  "_kind": "Page",
  "_file": "wiki/docs/testing/pipeline-integration.md",
  "_cluster": "wiki",
  "attributes": {
    "nodeKind": "Page",
    "sourcePath": "docs/testing/pipeline-integration.md",
    "sourceKind": "repo-docs",
    "title": "Pipeline Integration",
    "displayName": "Pipeline Integration",
    "slug": "docs/testing/pipeline-integration",
    "articlePath": "wiki/docs/testing/pipeline-integration.md",
    "article": "\n# Pipeline Integration\n\nThe pipeline should add new testing lanes in stages. No-model tests protect every pull request. Model-backed tests protect promotion and release confidence without making ordinary PRs depend on provider availability.\n\n## Workflow Placement\n\n## Current Implementation\n\nThe current implementation is consolidated in `.github/workflows/publish.yml`. That workflow owns the live-stack scenario and OS matrix directly under `live_stack_e2e`, exports each selected scenario through `LIVE_STACK_*` environment variables, runs `npm run test:e2e:live-stack:pipeline`, and writes the per-scenario coverage artifact with `npm run coverage:e2e:live-stack`. Test code executes exactly one pipeline-selected scenario when `LIVE_STACK_REQUIRE_EVIDENCE=1`; it must not enumerate the scenario matrix or run a code-side matrix runner.\n\n`Publish` now also owns two deterministic matrices before live-stack publish preflight. `agent_mux_hooks_mux_e2e` covers the no-Babysitter-SDK path `agent-mux hooks -> hooks-mux invoke` for `claude-code`, `codex`, and `pi`; the matrix supplies agent, adapter, hook event, payload, and expected canonical phase. `no_model_mock_matrix` is a stack E2E matrix: GitHub chooses the runtime (`agent-mux-mocks` or local real-agent CLI shim), agent (`claude`, `codex`, `pi`, `gemini`), and hook mode (`none` or `hooks-mux`); the test installs/verifies the agent path, launches it with an agent-mux profile, routes the model call through a local transport-mux mock model, and optionally proves hooks-mux normalization. Both jobs are dependencies of `Prepare Publish`, so package publish/deploy cannot start until these matrices pass.\n\n`Publish` now also owns the branch-aware publish/deploy topology for `develop`, `staging`, and `main`: validation and live-stack jobs precede `Prepare Publish`; package publishes, docs deploy, Atlas WebUI deploy, cloud deploy, release tagging, and external plugin sync depend on that prepared publish ref/version.\n\n\n| Workflow phase | Lanes | Trigger | Required behavior |\n| --- | --- | --- | --- |\n| Pull request / push CI | No-model unit, contract, mock integration, docs QA | Every PR and branch push | Fast, deterministic, no secrets, no live providers |\n| Publish preflight | Full no-model suite plus selected model-backed smoke | Push to `develop`, `staging`, or `main` before publish/deploy jobs | Blocks publish/deploy if runtime or harness smoke fails |\n| Release preflight | Full no-model suite plus model-backed release smoke | Push to `main` before publish/release jobs | Blocks production publish if live Codex/Claude/runtime smoke fails |\n| Scheduled nightly | Full model-backed suite | Nightly or twice daily | Detects provider, harness, CLI, and auth drift outside code changes |\n| Manual diagnostics | Any single lane or provider | `workflow_dispatch` | Lets maintainers rerun one harness/provider without re-running the full matrix |\n\n## Recommended New Workflows\n\nDo not resurrect the retired Docker workflow names. Use new workflow names that describe the new strategy:\n\n- `publish.yml` currently runs deterministic validation and model-backed live-stack coverage inline.\n- Optional future `testing-no-model.yml` can extract deterministic PR/push coverage if another workflow needs the same contract.\n- Optional future `testing-model-backed.yml` can extract scheduled/manual model-backed coverage if it should run independently from publish.\n- Optional future `testing-coverage-report.yml` can extract repository-wide coverage aggregation if coverage becomes too expensive for the default CI workflow.\n\nReusable workflows are optional extraction targets, not the current source of truth. Existing `.github/workflows/ci.yml` can keep fast PR checks, while `.github/workflows/publish.yml` owns publish-time validation, live-stack preflight, deploy, tagging, and plugin sync.\n\n## Secret Gating\n\nModel-backed jobs must use explicit `if:` guards before setup:\n\n| Provider or harness | Required signals |\n| --- | --- |\n| Codex | OpenAI credential configured for CI and Codex runtime install available |\n| Claude Code | Foundry/OpenAI credential configured for CI, Claude Code runtime install available, and transport-mux proxy path enabled |\n| Agent-core provider | Backend-specific credential and selected backend metadata |\n| Cloud/provider variants | Environment-specific credentials, region/project metadata, and rate-limit budget |\n\nA skipped model-backed job should say which credential or capability was missing. A required staging/release model-backed job should fail if the job was selected but setup cannot satisfy the declared dependency.\n\n## Suggested Dependency Shape\n\nStaging and release should be ordered like this:\n\n1. Build and no-model tests.\n2. Package and generated artifact checks.\n3. Model-backed runtime smoke, transport-mux bridge smoke, and capability-gated plugin/session smoke.\n4. Publish or deploy jobs.\n5. Post-publish verification or external sync jobs.\n\nThis keeps publish jobs behind live runtime proof without forcing every PR to spend model budget.\n\n## Artifact Policy\n\nEvery E2E job should upload:\n\n- command transcript,\n- redacted harness discovery JSON,\n- redacted event logs,\n- transport-mux launch-plan JSON when proxy launch is under test,\n- redacted proxy config and env injection diff,\n- route transcripts, streaming event transcripts, metrics snapshots, and cache stats for transport-mux lanes,\n- run IDs and session IDs,\n- coverage output when collected,\n- provider/harness version metadata,\n- skip reason if the job did not run.\n\nArtifacts must never include raw API keys, token files, home-directory credentials, or full provider request payloads when those payloads may contain secrets.\n\n## Reusable Workflow Contracts\n\n| Workflow | Inputs | Outputs | Required artifacts | Downstream consumers |\n| --- | --- | --- | --- | --- |\n| Optional `testing-no-model.yml` | `scope`, `changed_packages`, `coverage_mode` | `no_model_status`, `coverage_artifact`, `junit_artifact` | Vitest logs, Playwright traces on failure, package coverage summaries | Future extraction for `ci.yml` and `publish.yml` |\n| Optional `testing-model-backed.yml` | `provider`, `agent`, `backend`, `path`, `prompt_fixture`, `required` | `model_backed_status`, `skip_reason`, `run_artifact` | Separate artifacts per path: setup JSON, agent-mux session events, transport-mux launch/env/metrics evidence, babysitter-agent run proof, stop-hook evidence | Future extraction from `publish.yml` live-stack jobs or scheduled workflow |\n| Optional `testing-coverage-report.yml` | `coverage_artifacts`, `playwright_artifacts`, `model_backed_artifacts` | `coverage_summary`, `scenario_summary` | Merged markdown summary, raw coverage JSON, trace index | Future PR summaries and release candidate notes |\n\nRequired workflows should expose explicit failure/skip outputs. A publish workflow must depend on `*_status == success`; a scheduled workflow may record `skip_reason` without failing when credentials are intentionally absent.\n\n## Required Check Names\n\nStable required-check names prevent branch protection churn:\n\n- `testing / no-model contracts`\n- `testing / no-model runtime`\n- `testing / no-model transport-mux`\n- `testing / no-model ui`\n- `testing / model-backed codex`\n- `testing / model-backed claude-code`\n- `testing / model-backed babysitter-agent`\n- `testing / model-backed transport-mux bridge`\n- `testing / coverage summary`\n\nOnly no-model checks should be required for ordinary PRs at first. Model-backed checks should become required only on `staging` and release branches after their quarantine period ends.\n\n## Current Inventory Naming\n\nRoadmap slice 0 keeps current workflow behavior intact and uses [Current Test Command Inventory](./current-test-command-inventory.md) as the source of truth for existing package scripts. Workflow comments and future reusable jobs should use the inventory artifact names before they introduce new command bundles.\n\n## Proposed Command Bundles\n\nStatus: Mixed. `test:e2e:live-stack:*` and `coverage:e2e:live-stack` are current scripts; the broader no-model/model-backed bundle names remain proposed until a follow-up slice adds them.\n\nPackage owners may initially wire these bundles as workflow steps that call existing package-local scripts, then promote them into root `package.json` scripts when at least two packages share the lane.\n\n| Proposed command | Lane | Contents |\n| --- | --- | --- |\n| `npm run test:no-model` | No-model | Package unit, contract, mock harness, CLI smoke, docs/generator checks |\n| `npm run test:no-model:mux` | No-model | Agent-mux, transport-mux route/runtime/env/launch-plan, hooks-mux, gateway, and fixture compatibility checks |\n| `npm run test:no-model:harness-setup` | No-model | `harness:list`, install dry-runs, plugin install dry-runs, discovery fixtures |\n| `npm run test:model-backed` | Model-backed | All selected live provider/harness tests with credential gates |\n| `npm run test:model-backed:agent-mux-plugin` | Model-backed | Capability-gated `amux run` plugin/session tests with Babysitter plugin preconditions |\n| `npm run test:model-backed:runtime` | Model-backed | Agent-core, transport-mux bridge, agent-mux session smoke, and babysitter-agent runtime smoke; babysitter-agent jobs do not run installers |\n| `npm run test:model-backed:transport-mux` | Model-backed | Agent-core stream through transport-mux plus agent-mux-launched external harness proxy smoke with credential gates |\n| `npm run coverage:repo` | No-model plus reports | Merge package coverage and scenario summaries into one artifact |\n\nInitial workflow implementation can call package-local commands directly. These bundle names become useful once at least two packages share a lane.\n",
    "documents": []
  },
  "outgoingEdges": [],
  "incomingEdges": [
    {
      "from": "page:docs-testing",
      "to": "page:docs-testing-pipeline-integration",
      "kind": "contains_page"
    }
  ]
}