II.
Page overview
Reference · livepage:docs-testing-stack-permutations
Stack Permutations overview
Inspect the raw attributes, linked wiki pages, and inbound or outbound graph edges for page:docs-testing-stack-permutations.
Attributes
nodeKind
Page
sourcePath
docs/testing/stack-permutations.md
sourceKind
repo-docs
title
Stack Permutations
displayName
Stack Permutations
slug
docs/testing/stack-permutations
articlePath
wiki/docs/testing/stack-permutations.md
article
# Stack Permutations
The test strategy must treat the stack as modular. A valid E2E does not need every layer, and some layer combinations are invalid even if the names sound related.
## Layer Map
| Layer | Package or surface | Owns | Does not own |
| --- | --- | --- | --- |
| Core Babysitter SDK | `packages/sdk`, `babysitter run:*`, `task:*`, `hook:*`, `plugin:*` | Event-sourced runs, task effects, process state, generic plugin registry, SDK harness install commands | Model session UI, agent-mux adapter registry, provider transport implementation |
| SDK harness setup | `babysitter harness:install`, `babysitter harness:install-plugin` | Installing external harness CLIs where supported and installing Babysitter harness plugins | `babysitter-agent` runtime behavior |
| Babysitter-agent runtime | `packages/babysitter-agent` runtime CLI | Runtime orchestration UX, model-backed planning/execution, agent-core path, agent-mux bridge for external harness invocation | Harness plugin installation and setup commands |
| Agent-mux core | `packages/agent-mux/core`, `@a5c-ai/agent-mux` | Adapter registry, `createClient().run`, sessions, workspaces, plugin manager, runtime hooks, provider/model config | Babysitter run journal ownership |
| Agent-mux adapters | `packages/agent-mux/adapters` | Per-agent spawn/programmatic adapters, capabilities, session parsing, adapter plugin APIs when supported | Generic Babysitter process orchestration |
| Transport-mux | `packages/transport-mux` | Harness-facing provider protocol routes, local proxy runtime lifecycle, proxy auth, runtime env injection, passthrough forwarding, streaming/non-streaming response shape, cancellation, timeout, and metrics/cache visibility | Installing harnesses/plugins, normalizing hooks, owning Babysitter journals, or proving agent-mux adapter/session semantics without a consumer |
| Hooks-mux | `packages/hooks-mux/*` | Normalizing raw hook payloads and merge/policy behavior across harnesses | Agent-mux runtime hook dispatch and SDK stop-hook iteration policy |
| Agent-core | `packages/agent-core` | Programmatic model session backend and tool-call loop used by internal/runtime paths | External harness plugin installation |
| Agent-plugins-mux | `packages/agent-plugins-mux` | Plugin target discovery and plugin target contracts | Runtime session execution |
## Primary E2E Paths
| Path | Entry point | Required setup | What it proves | What it must not claim |
| --- | --- | --- | --- | --- |
| SDK run-loop E2E | `babysitter run:create`, `run:iterate`, `task:post` | Fixture process and optional mocked hooks | Process state, effects, journal replay, stop-hook continuation | Provider or external harness behavior |
| SDK harness/plugin setup E2E | `babysitter harness:install`, `babysitter harness:install-plugin` | Temporary workspace and installer fixtures or real installer runner | Harness install delegation, plugin installer package behavior, idempotent manifests | `babysitter-agent` runtime correctness |
| Agent-mux adapter/session E2E | `amux run <agent>` or `createClient().run({ agent })` | Adapter fixtures or real agent CLI and credentials | Adapter events, session lifecycle, model/provider config, runtime hooks | Babysitter process journal correctness unless a plugin invokes Babysitter |
| Agent-mux plugin E2E | `amux plugin ...` or `client.plugins.*` where adapter supports plugins | Adapter with `supportsPlugins`, plugin manifest/marketplace fixture or real plugin target | Agent-native plugin install/list/uninstall and plugin event behavior | Universal plugin support across all agents |
| Babysitter plugin through agent-mux E2E | Agent-mux starts an external harness session after the Babysitter harness plugin is installed | Harness-specific Babysitter plugin installed by SDK installer or native plugin path, then `amux run <agent>` | The plugin command such as `/babysitter:call` creates a Babysitter run, completes it, and hook/stop behavior is visible from the harness session | `babysitter-agent` install/setup behavior |
| Babysitter-agent runtime E2E | `babysitter-agent` runtime commands | Preinstalled or mocked model backend; no setup command inside the test | Runtime planning/orchestration, selected backend, run lifecycle, task posting, agent-core or agent-mux bridge behavior | Harness plugin installation |
| Transport-mux E2E | `amux-proxy`, `startTransportMuxRuntime`, `applyTransportMuxToHarnessEnv`, or `amux launch --with-proxy*` | Local route fixture, agent-core stream, or agent-mux external-harness launch that needs a proxy bridge | Route/codec contract, proxy auth, env injection, launch proxy decision, streaming/non-streaming response shape, cancellation, timeout, passthrough, metrics/cache artifacts | Plugin install, harness install, hook normalization, or Babysitter run lifecycle by itself |
| Hooks-mux E2E | Hook adapter CLI/core normalizer | Raw hook payload fixtures or redacted live payloads | Hook normalization, merge policy, fail-open/fail-closed behavior | Agent-mux session lifecycle by itself |
## Transport-Mux Valid Permutations
Transport-mux is the carrier/proxy seam between a harness-facing protocol and a target provider/runtime. It can be tested alone with local fixtures, or as a bridge started by agent-mux launch, but it is not a plugin manager, harness installer, hook adapter, or Babysitter run owner.
| Permutation | Lane | Entry point | Required assertions |
| --- | --- | --- | --- |
| Package route/codec fixture | No-model | `createTransportMuxApp` or `amux-proxy` with fixture engine | `/health`, `/v1/models`, `/metrics`, `/cache/stats`, `/v1/count_tokens`, `/v1/messages`, `/v1/chat/completions`, `/v1/responses`, `/v1beta/models/*`, `/v1/projects/*`, `/converse`, `/models/chat/completions`, and `/passthrough/*` return the expected protocol shapes and errors |
| Runtime env bridge | No-model | `startTransportMuxRuntime` and `applyTransportMuxToHarnessEnv` | `AMUX_PROXY_BASE_URL`, `AMUX_PROXY_AUTH_TOKEN`, and provider-specific base URL/API key variables are injected only for the exposed transport, with token values redacted in artifacts |
| Agent-mux launch decision | No-model | `resolveLaunchPlan` and launch dry-run fixtures | Native provider, proxy forced, proxy if-needed, and proxy forbidden cases produce the expected `proxyNeeded`, `proxyReason`, and exposed transport |
| Agent-core stream through transport | No-model and model-backed | Agent-core event stream consumed through transport-mux | Fixture or live deltas, final event, cancellation, timeout, and usage metadata survive transport framing |
| External harness through agent-mux proxy | Model-backed | `amux launch <harness> <provider> --with-proxy` or `--with-proxy-if-needed` | Real Codex/Claude-compatible harness traffic uses the local proxy URL, emits a redacted launch plan, and completes a sentinel stream |
| Passthrough provider bridge | No-model first, model-backed only when justified | `/passthrough/*` with configured `apiBase` | Path/query/body forwarding, auth propagation, upstream failure mapping, and timeout behavior are visible without leaking provider secrets |
## Capability-Gated Adapter Matrix
| Agent or harness | Agent-mux adapter mapping | Current plugin-manager expectation | Runtime-hook expectation | Valid live permutations |
| --- | --- | --- | --- | --- |
| `claude-code` / `claude` | `claude-code` maps to `claude` | Valid where the Claude adapter exposes plugin APIs | Native/runtime hook coverage including stop hook is valid | Agent-mux session, agent-mux plugin manager, Babysitter plugin through agent-mux, babysitter-agent external-harness bridge |
| `codex` | `codex` maps to `codex` | Capability-gated; current Codex adapter reports `supportsPlugins: false`, so do not require agent-mux `client.plugins.*` for Codex | Runtime hook fixtures are valid; live plugin manager install is not assumed | Agent-mux session, SDK harness plugin installer, Babysitter plugin through Codex only after installer/native plugin support is proven, babysitter-agent external-harness bridge |
| `gemini-cli` / `gemini` | `gemini-cli` maps to `gemini` | Capability-gated by adapter | Runtime hook fixture first, live after adapter support is proven | Agent-mux session and SDK installer smoke; plugin E2E only after capability proof |
| `agent-core` | Not an agent-mux external harness mapping | No harness plugin install | Programmatic event hooks through owning layer only | Babysitter-agent internal/programmatic runtime, transport-mux with agent-core stream |
| `pi` | Intentionally not agent-mux in babysitter-agent mapping | SDK plugin installer may exist, but runtime path is direct/agent-core-like | Do not route through agent-mux bridge | Direct SDK/babysitter-agent path only |
| `babysitter` adapter in agent-mux | Agent-mux can target Babysitter as an adapter | Babysitter plugin manager is generic SDK plugin registry, not external harness plugin install | Adapter parses Babysitter event output | Agent-mux consuming Babysitter output; separate from babysitter-agent runtime setup |
## Invalid Combinations
| Invalid combination | Why it is invalid |
| --- | --- |
| Babysitter-agent E2E that starts with `babysitter harness:install` or `harness:install-plugin` | That tests SDK harness setup, not babysitter-agent runtime behavior |
| Agent-mux plugin-manager test that requires Codex plugin install without checking `supportsPlugins` | Current Codex adapter reports plugin manager support as false |
| Transport-mux test that asserts plugin installation | Transport-mux carries harness-facing provider traffic; SDK harness setup or agent-mux plugin APIs own plugin installation |
| Transport-mux test that runs `babysitter harness:install` | Harness install belongs to SDK harness setup, not the proxy runtime |
| Transport-mux test that asserts hook normalization | Hooks-mux owns hook payload normalization; transport-mux may only carry traffic adjacent to a hook-emitting harness |
| Transport-mux test that claims Babysitter run completion by itself | Babysitter SDK or babysitter-agent owns run creation, task posting, and terminal journal state |
| Hooks-mux fixture that claims full agent-mux session coverage | Hooks-mux normalizes hook payloads; agent-mux owns session lifecycle |
| Agent-core path routed through agent-mux external-harness mapping | The babysitter-agent map explicitly excludes `agent-core` and `pi` from agent-mux external harness mapping |
| `/babysitter:call` plugin smoke that only checks final assistant text | It must assert Babysitter run ID, run events, terminal state, and hook evidence |
## Minimum Permutation Set
The rebuilt strategy should implement these before claiming broad E2E coverage:
| ID | Lane | Stack | Required evidence |
| --- | --- | --- | --- |
| P1 | No-model | SDK run loop + mocked stop hook | `run:create`, pending task, `task:post`, `run:iterate`, completed proof, hook log |
| P2 | No-model | SDK harness installer + plugin installer dry-runs | JSON install plan, plugin target, idempotency fixture |
| P3 | No-model | Agent-mux core + mock adapter + runtime hooks | `session_start`, prompt/input, `session_end`, stop-hook decision fixture |
| P4 | No-model | Agent-mux PluginManager + plugin-capable adapter fixture | list/install/uninstall/update behavior and capability errors for non-plugin agents |
| P5 | No-model | Transport-mux route/codec fixture | supported route matrix, auth failure, invalid JSON, count_tokens supported/unsupported, streaming and non-streaming response artifacts |
| P5a | No-model | Transport-mux runtime env bridge + agent-mux launch decision | redacted env diff, proxy config, `proxyNeeded`/`proxyReason`, forced/if-needed/native/forbidden cases |
| P6 | No-model | Hooks-mux raw payload fixtures | normalized stop/session/tool events and merge-policy artifact |
| P7 | Model-backed | Babysitter-agent + agent-core backend | created run, planned task, posted result, terminal state, redacted model trace |
| P8 | Model-backed | Babysitter-agent + external harness bridge | `babysitter-agent call/invoke`, agent-mux mapped session events, terminal result, no install steps |
| P9 | Model-backed | Agent-mux + Claude + Babysitter plugin | harness/plugin precondition evidence, `amux run claude`, `/babysitter:call`, Babysitter run completion, stop-hook evidence |
| P10 | Model-backed/capability-gated | Agent-mux + Codex + Babysitter plugin | Only enabled after plugin install support is proven; otherwise assert skip reason from capability gate |
| P11 | Model-backed | Transport-mux + agent-core stream | live or credential-gated agent-core deltas carried over transport-mux, cancellation/timeout behavior, redacted provider metadata |
| P12 | Model-backed | Agent-mux external harness + transport-mux proxy | `amux launch` starts transport-mux, harness uses proxy env, sentinel stream completes, metrics snapshot and redacted launch plan are uploaded |
| P13 | No-model | Agent-mux hooks + hooks-mux bridge for `claude-code`, `codex`, `pi` | `amux hooks add/handle`, `a5c-hooks-mux invoke`, normalized phase evidence, no Babysitter SDK calls, no provider credentials |
| P14 | No-model | Pipeline-owned stack matrix across `agent-mux-mocks` and real-agent CLI shims for `claude`, `codex`, `pi`, and `gemini` | `amux install --dry-run`, profile-backed launch/run, transport-mux mock-model request evidence, and optional hooks-mux normalized phase artifact from `no_model_mock_matrix` |
Each implementation slice should name which permutation IDs it covers. If a job covers only setup, it should not be labeled as runtime E2E.
## Agent-Mux Live Install Modes
The live external-harness matrix has two valid agent-mux paths:
| Mode | Valid targets | Installer responsibility | Prompt responsibility | Lifecycle responsibility |
| --- | --- | --- | --- | --- |
| `babysitter-plugin` | `claude-code` via `claude`, `codex`, `gemini-cli` via `gemini`, `pi` | `amux install <target>` installs or verifies the harness CLI; the local Babysitter SDK and generated Babysitter plugin package are installed before launch | The launch prompt is a Babysitter command, for example `/babysitter:call ...` | Must prove Babysitter run creation, effects, journals/task artifacts, native stop hook execution, hooks-mux normalization, agent-mux session, transport trace, and provider trace |
| `vanilla` | `claude`, `codex`, `gemini`, `pi`, `babysitter` | `amux install <target>` only | The launch prompt is a normal non-Babysitter sentinel prompt | Must prove agent-mux session/launch, transport trace, and provider trace; it must not claim plugin-driven external-harness hook coverage; babysitter-agent rows may additionally assert agent-core-backed Babysitter runtime evidence when required |
These are different integration paths. `babysitter-plugin` validates plugin-mediated Babysitter lifecycle behavior through an external harness; `vanilla` validates the same agent-mux install/launch/provider path without Babysitter plugin setup.
documents
[]
Outgoing edges
None.
Incoming edges
None.