docs/testing/stack-permutations
Stack Permutations reference
The test strategy must treat the stack as modular. A valid E2E does not need every layer, and some layer combinations are invalid even if the names sound related.
Continue reading
Nearby pages in the same section.
Stack Permutations
The test strategy must treat the stack as modular. A valid E2E does not need every layer, and some layer combinations are invalid even if the names sound related.
Layer Map
| Layer | Package or surface | Owns | Does not own |
|---|---|---|---|
| Core Babysitter SDK | packages/sdk, babysitter run:*, task:*, hook:*, plugin:* | Event-sourced runs, task effects, process state, generic plugin registry, SDK harness install commands | Model session UI, agent-mux adapter registry, provider transport implementation |
| SDK harness setup | babysitter harness:install, babysitter harness:install-plugin | Installing external harness CLIs where supported and installing Babysitter harness plugins | babysitter-agent runtime behavior |
| Babysitter-agent runtime | packages/babysitter-agent runtime CLI | Runtime orchestration UX, model-backed planning/execution, agent-core path, agent-mux bridge for external harness invocation | Harness plugin installation and setup commands |
| Agent-mux core | packages/agent-mux/core, @a5c-ai/agent-mux | Adapter registry, createClient().run, sessions, workspaces, plugin manager, runtime hooks, provider/model config | Babysitter run journal ownership |
| Agent-mux adapters | packages/agent-mux/adapters | Per-agent spawn/programmatic adapters, capabilities, session parsing, adapter plugin APIs when supported | Generic Babysitter process orchestration |
| Transport-mux | packages/transport-mux | Harness-facing provider protocol routes, local proxy runtime lifecycle, proxy auth, runtime env injection, passthrough forwarding, streaming/non-streaming response shape, cancellation, timeout, and metrics/cache visibility | Installing harnesses/plugins, normalizing hooks, owning Babysitter journals, or proving agent-mux adapter/session semantics without a consumer |
| Hooks-mux | packages/hooks-mux/* | Normalizing raw hook payloads and merge/policy behavior across harnesses | Agent-mux runtime hook dispatch and SDK stop-hook iteration policy |
| Agent-core | packages/agent-core | Programmatic model session backend and tool-call loop used by internal/runtime paths | External harness plugin installation |
| Agent-plugins-mux | packages/agent-plugins-mux | Plugin target discovery and plugin target contracts | Runtime session execution |
Primary E2E Paths
| Path | Entry point | Required setup | What it proves | What it must not claim |
|---|---|---|---|---|
| SDK run-loop E2E | babysitter run:create, run:iterate, task:post | Fixture process and optional mocked hooks | Process state, effects, journal replay, stop-hook continuation | Provider or external harness behavior |
| SDK harness/plugin setup E2E | babysitter harness:install, babysitter harness:install-plugin | Temporary workspace and installer fixtures or real installer runner | Harness install delegation, plugin installer package behavior, idempotent manifests | babysitter-agent runtime correctness |
| Agent-mux adapter/session E2E | amux run <agent> or createClient().run({ agent }) | Adapter fixtures or real agent CLI and credentials | Adapter events, session lifecycle, model/provider config, runtime hooks | Babysitter process journal correctness unless a plugin invokes Babysitter |
| Agent-mux plugin E2E | amux plugin ... or client.plugins.* where adapter supports plugins | Adapter with supportsPlugins, plugin manifest/marketplace fixture or real plugin target | Agent-native plugin install/list/uninstall and plugin event behavior | Universal plugin support across all agents |
| Babysitter plugin through agent-mux E2E | Agent-mux starts an external harness session after the Babysitter harness plugin is installed | Harness-specific Babysitter plugin installed by SDK installer or native plugin path, then amux run <agent> | The plugin command such as /babysitter:call creates a Babysitter run, completes it, and hook/stop behavior is visible from the harness session | babysitter-agent install/setup behavior |
| Babysitter-agent runtime E2E | babysitter-agent runtime commands | Preinstalled or mocked model backend; no setup command inside the test | Runtime planning/orchestration, selected backend, run lifecycle, task posting, agent-core or agent-mux bridge behavior | Harness plugin installation |
| Transport-mux E2E | amux-proxy, startTransportMuxRuntime, applyTransportMuxToHarnessEnv, or amux launch --with-proxy* | Local route fixture, agent-core stream, or agent-mux external-harness launch that needs a proxy bridge | Route/codec contract, proxy auth, env injection, launch proxy decision, streaming/non-streaming response shape, cancellation, timeout, passthrough, metrics/cache artifacts | Plugin install, harness install, hook normalization, or Babysitter run lifecycle by itself |
| Hooks-mux E2E | Hook adapter CLI/core normalizer | Raw hook payload fixtures or redacted live payloads | Hook normalization, merge policy, fail-open/fail-closed behavior | Agent-mux session lifecycle by itself |
Transport-Mux Valid Permutations
Transport-mux is the carrier/proxy seam between a harness-facing protocol and a target provider/runtime. It can be tested alone with local fixtures, or as a bridge started by agent-mux launch, but it is not a plugin manager, harness installer, hook adapter, or Babysitter run owner.
| Permutation | Lane | Entry point | Required assertions |
|---|---|---|---|
| Package route/codec fixture | No-model | createTransportMuxApp or amux-proxy with fixture engine | /health, /v1/models, /metrics, /cache/stats, /v1/count_tokens, /v1/messages, /v1/chat/completions, /v1/responses, /v1beta/models/*, /v1/projects/*, /converse, /models/chat/completions, and /passthrough/* return the expected protocol shapes and errors |
| Runtime env bridge | No-model | startTransportMuxRuntime and applyTransportMuxToHarnessEnv | AMUX_PROXY_BASE_URL, AMUX_PROXY_AUTH_TOKEN, and provider-specific base URL/API key variables are injected only for the exposed transport, with token values redacted in artifacts |
| Agent-mux launch decision | No-model | resolveLaunchPlan and launch dry-run fixtures | Native provider, proxy forced, proxy if-needed, and proxy forbidden cases produce the expected proxyNeeded, proxyReason, and exposed transport |
| Agent-core stream through transport | No-model and model-backed | Agent-core event stream consumed through transport-mux | Fixture or live deltas, final event, cancellation, timeout, and usage metadata survive transport framing |
| External harness through agent-mux proxy | Model-backed | amux launch <harness> <provider> --with-proxy or --with-proxy-if-needed | Real Codex/Claude-compatible harness traffic uses the local proxy URL, emits a redacted launch plan, and completes a sentinel stream |
| Passthrough provider bridge | No-model first, model-backed only when justified | /passthrough/* with configured apiBase | Path/query/body forwarding, auth propagation, upstream failure mapping, and timeout behavior are visible without leaking provider secrets |
Capability-Gated Adapter Matrix
| Agent or harness | Agent-mux adapter mapping | Current plugin-manager expectation | Runtime-hook expectation | Valid live permutations |
|---|---|---|---|---|
claude-code / claude | claude-code maps to claude | Valid where the Claude adapter exposes plugin APIs | Native/runtime hook coverage including stop hook is valid | Agent-mux session, agent-mux plugin manager, Babysitter plugin through agent-mux, babysitter-agent external-harness bridge |
codex | codex maps to codex | Capability-gated; current Codex adapter reports supportsPlugins: false, so do not require agent-mux client.plugins.* for Codex | Runtime hook fixtures are valid; live plugin manager install is not assumed | Agent-mux session, SDK harness plugin installer, Babysitter plugin through Codex only after installer/native plugin support is proven, babysitter-agent external-harness bridge |
gemini-cli / gemini | gemini-cli maps to gemini | Capability-gated by adapter | Runtime hook fixture first, live after adapter support is proven | Agent-mux session and SDK installer smoke; plugin E2E only after capability proof |
agent-core | Not an agent-mux external harness mapping | No harness plugin install | Programmatic event hooks through owning layer only | Babysitter-agent internal/programmatic runtime, transport-mux with agent-core stream |
pi | Intentionally not agent-mux in babysitter-agent mapping | SDK plugin installer may exist, but runtime path is direct/agent-core-like | Do not route through agent-mux bridge | Direct SDK/babysitter-agent path only |
babysitter adapter in agent-mux | Agent-mux can target Babysitter as an adapter | Babysitter plugin manager is generic SDK plugin registry, not external harness plugin install | Adapter parses Babysitter event output | Agent-mux consuming Babysitter output; separate from babysitter-agent runtime setup |
Invalid Combinations
| Invalid combination | Why it is invalid |
|---|---|
Babysitter-agent E2E that starts with babysitter harness:install or harness:install-plugin | That tests SDK harness setup, not babysitter-agent runtime behavior |
Agent-mux plugin-manager test that requires Codex plugin install without checking supportsPlugins | Current Codex adapter reports plugin manager support as false |
| Transport-mux test that asserts plugin installation | Transport-mux carries harness-facing provider traffic; SDK harness setup or agent-mux plugin APIs own plugin installation |
Transport-mux test that runs babysitter harness:install | Harness install belongs to SDK harness setup, not the proxy runtime |
| Transport-mux test that asserts hook normalization | Hooks-mux owns hook payload normalization; transport-mux may only carry traffic adjacent to a hook-emitting harness |
| Transport-mux test that claims Babysitter run completion by itself | Babysitter SDK or babysitter-agent owns run creation, task posting, and terminal journal state |
| Hooks-mux fixture that claims full agent-mux session coverage | Hooks-mux normalizes hook payloads; agent-mux owns session lifecycle |
| Agent-core path routed through agent-mux external-harness mapping | The babysitter-agent map explicitly excludes agent-core and pi from agent-mux external harness mapping |
/babysitter:call plugin smoke that only checks final assistant text | It must assert Babysitter run ID, run events, terminal state, and hook evidence |
Minimum Permutation Set
The rebuilt strategy should implement these before claiming broad E2E coverage:
| ID | Lane | Stack | Required evidence |
|---|---|---|---|
| P1 | No-model | SDK run loop + mocked stop hook | run:create, pending task, task:post, run:iterate, completed proof, hook log |
| P2 | No-model | SDK harness installer + plugin installer dry-runs | JSON install plan, plugin target, idempotency fixture |
| P3 | No-model | Agent-mux core + mock adapter + runtime hooks | session_start, prompt/input, session_end, stop-hook decision fixture |
| P4 | No-model | Agent-mux PluginManager + plugin-capable adapter fixture | list/install/uninstall/update behavior and capability errors for non-plugin agents |
| P5 | No-model | Transport-mux route/codec fixture | supported route matrix, auth failure, invalid JSON, count_tokens supported/unsupported, streaming and non-streaming response artifacts |
| P5a | No-model | Transport-mux runtime env bridge + agent-mux launch decision | redacted env diff, proxy config, proxyNeeded/proxyReason, forced/if-needed/native/forbidden cases |
| P6 | No-model | Hooks-mux raw payload fixtures | normalized stop/session/tool events and merge-policy artifact |
| P7 | Model-backed | Babysitter-agent + agent-core backend | created run, planned task, posted result, terminal state, redacted model trace |
| P8 | Model-backed | Babysitter-agent + external harness bridge | babysitter-agent call/invoke, agent-mux mapped session events, terminal result, no install steps |
| P9 | Model-backed | Agent-mux + Claude + Babysitter plugin | harness/plugin precondition evidence, amux run claude, /babysitter:call, Babysitter run completion, stop-hook evidence |
| P10 | Model-backed/capability-gated | Agent-mux + Codex + Babysitter plugin | Only enabled after plugin install support is proven; otherwise assert skip reason from capability gate |
| P11 | Model-backed | Transport-mux + agent-core stream | live or credential-gated agent-core deltas carried over transport-mux, cancellation/timeout behavior, redacted provider metadata |
| P12 | Model-backed | Agent-mux external harness + transport-mux proxy | amux launch starts transport-mux, harness uses proxy env, sentinel stream completes, metrics snapshot and redacted launch plan are uploaded |
| P13 | No-model | Agent-mux hooks + hooks-mux bridge for claude-code, codex, pi | amux hooks add/handle, a5c-hooks-mux invoke, normalized phase evidence, no Babysitter SDK calls, no provider credentials |
| P14 | No-model | Pipeline-owned stack matrix across agent-mux-mocks and real-agent CLI shims for claude, codex, pi, and gemini | amux install --dry-run, profile-backed launch/run, transport-mux mock-model request evidence, and optional hooks-mux normalized phase artifact from no_model_mock_matrix |
Each implementation slice should name which permutation IDs it covers. If a job covers only setup, it should not be labeled as runtime E2E.
Agent-Mux Live Install Modes
The live external-harness matrix has two valid agent-mux paths:
| Mode | Valid targets | Installer responsibility | Prompt responsibility | Lifecycle responsibility |
|---|---|---|---|---|
babysitter-plugin | claude-code via claude, codex, gemini-cli via gemini, pi | amux install <target> installs or verifies the harness CLI; the local Babysitter SDK and generated Babysitter plugin package are installed before launch | The launch prompt is a Babysitter command, for example /babysitter:call ... | Must prove Babysitter run creation, effects, journals/task artifacts, native stop hook execution, hooks-mux normalization, agent-mux session, transport trace, and provider trace |
vanilla | claude, codex, gemini, pi, babysitter | amux install <target> only | The launch prompt is a normal non-Babysitter sentinel prompt | Must prove agent-mux session/launch, transport trace, and provider trace; it must not claim plugin-driven external-harness hook coverage; babysitter-agent rows may additionally assert agent-core-backed Babysitter runtime evidence when required |
These are different integration paths. babysitter-plugin validates plugin-mediated Babysitter lifecycle behavior through an external harness; vanilla validates the same agent-mux install/launch/provider path without Babysitter plugin setup.