Agentic AI Atlasby a5c.ai
OverviewWikiGraphFor AgentsEdgesSearchWorkspace
/
GitHubDocsDiscord
i.3Wiki
Agentic AI Atlas · Agent Mux And Runtime E2E
docs/testing/agent-mux-and-runtime-e2ea5c.ai
Search the atlas/
Wiki · linked records

Article and nearby pages

I.Current articlepp. 1 - 1
Coverage And ReportingCurrent Test Command InventoryHarness And Plugin E2EImplementation RoadmapMock And Fixture ContractsPipeline Integration
I.
Wiki article

docs/testing/agent-mux-and-runtime-e2e

Reading · 8 min

Agent Mux And Runtime E2E reference

This strategy covers runtime paths after setup is already satisfied. It separates agent-mux sessions, transport carriers, agent-core programmatic sessions, and @a5c-ai/babysitter-agent orchestration. Harness/plugin install coverage lives in Harness And Plugin E2E(./harness-e2e.md), not in babysitter-agent runtime E2E.

Page nodewiki/docs/testing/agent-mux-and-runtime-e2e.mdNearby pages · 11Documents · 0

Continue reading

Nearby pages in the same section.

Coverage And ReportingCurrent Test Command InventoryHarness And Plugin E2EImplementation RoadmapMock And Fixture ContractsPipeline IntegrationPrimary Flow Data PathsQuality GatesStack PermutationsTest LanesTrace Identifiers And Evidence

Agent Mux And Runtime E2E

This strategy covers runtime paths after setup is already satisfied. It separates agent-mux sessions, transport carriers, agent-core programmatic sessions, and @a5c-ai/babysitter-agent orchestration. Harness/plugin install coverage lives in Harness And Plugin E2E, not in babysitter-agent runtime E2E.

Stack Scopes

ScopePackagesNo-model coverageModel-backed coverage
Protocol and event contractspackages/agent-mux/core, packages/agent-mux/gateway, packages/transport-muxSchema, event ordering, session lifecycle, error envelopes, reconnect behaviorReal event streams from Codex and Claude Code sessions match protocol contracts
Adapter translationpackages/agent-mux/adaptersPrompt normalization, tool-call mapping, stop reasons, model selection, fallback behavior with mock adaptersLive Codex and Claude Code adapters translate real provider output into mux events
Transport runtimepackages/transport-muxRoute matrix, proxy auth, protocol codecs, runtime env injection, passthrough forwarding, subprocess lifecycle, stream cancellation, timeout/error paths, metrics/cache snapshotsTransport-mux carries traffic for a real external harness through agent-mux launch and for an agent-core-backed stream
Agent-core bridgepackages/agent-coreProgrammatic session creation, mock provider responses, cancellation, usage accountingAgent-core invokes a real provider and returns events compatible with agent-mux and babysitter-agent
Hooks muxespackages/hooks-mux/*Adapter normalization, hook payload fixtures, CLI execution, approval/deny/error eventsReal harness hook payloads from Codex and Claude Code normalize to the same hook contract
Babysitter-agent runtimepackages/babysitter-agentSeam contract, phase orchestration, planner/executor mocks, run journal state, task posting, selected backendbabysitter-agent call/create-run/invoke uses preinstalled or mocked backends; no harness install or plugin install steps are part of this E2E
User surfacespackages/agent-mux/webui, packages/agent-mux/ui, packages/agent-mux/tuiPlaywright/Vitest against mock gateway and fixture sessionsOptional manual/live smoke against a model-backed gateway session

No-Model Runtime Suite

The no-model runtime suite should be built first. It should include:

  • transport-mux unit and E2E tests using local HTTP/subprocess fixtures for every supported exposed transport route.
  • transport-mux runtime tests for startTransportMuxRuntime, applyTransportMuxToHarnessEnv, proxy auth, redacted env diffs, metrics, cache stats, passthrough path/query preservation, invalid JSON, and upstream failure mapping.
  • agent-mux launch-plan tests for proxy forced, proxy if-needed, native/no-proxy, and proxy-forbidden cases so launch coverage proves the transport-mux decision seam.
  • agent-mux gateway/session tests using existing mock harness scenarios.
  • Adapter translation tests for Codex, Claude Code, and agent-core-style event streams.
  • babysitter-agent seam and orchestration tests with mocked planner/executor calls.
  • WebUI and TUI session tests using fixture transcripts and mock gateway responses.
  • Agent-mux plugin/session fixtures live in Harness And Plugin E2E; this file consumes their event fixtures only as runtime compatibility inputs.

Candidate command grouping:

bash
npm run test --workspace=@a5c-ai/transport-mux
npm run test --workspace=@a5c-ai/agent-mux-core
npm run test --workspace=@a5c-ai/agent-mux-adapters
npm run test --workspace=@a5c-ai/agent-mux-gateway
npm run test --workspace=@a5c-ai/agent-core
npm run test --workspace=@a5c-ai/babysitter-agent
npm run test:e2e --workspace=@a5c-ai/agent-mux-webui

Transport-Mux Coverage Matrix

Transport-mux coverage has to prove the proxy/runtime seam directly before it is used as evidence for agent-mux or babysitter-agent paths.

No-Model Transport Coverage

SurfaceTests to add or keepRequired artifacts
Supported transport routesExercise anthropic, openai-chat, openai-responses, google, bedrock-converse, azure-foundry, vertex-native, and passthrough against fixture enginesRoute transcript, response shape snapshot, invalid JSON/auth failure transcript
Streaming and non-streaming codecsVerify text deltas, final events, finish reasons, usage totals, and provider-specific envelopesStreaming event transcript and non-streaming body snapshot
Token and observability routesCover count_tokens success/unsupported behavior plus /metrics and /cache/statsToken count transcript, metrics snapshot, cache stats snapshot
Runtime env injectionCall applyTransportMuxToHarnessEnv for every exposed transportRedacted env diff proving only expected vars changed
Agent-mux launch seamCover resolveLaunchPlan for proxy forced, proxy if-needed, native/no-proxy, and forbidden proxy casesLaunch-plan JSON with proxyNeeded, proxyReason, and exposed transport
Passthrough forwardingPreserve path/query/body, inject upstream auth safely, and map upstream failuresRedacted upstream transcript and failure envelope

Model-Backed Transport Coverage

PathValid stackAssertion focus
Agent-core stream bridgeagent-core provider backend -> transport-mux -> fixture consumerReal or credential-gated stream deltas, final event, cancellation/timeout, and usage metadata survive the proxy layer
External harness bridgeamux launch <harness> <provider> --with-proxy* -> transport-mux -> target providerHarness receives proxy env, provider endpoint is not called directly by the harness, sentinel prompt completes, metrics show traffic
Babysitter-agent precondition bridgebabysitter-agent external-harness path only when it delegates through agent-mux and the selected harness requires proxy translationTransport-mux artifacts are supporting evidence for the bridge, while Babysitter run creation/task posting remains asserted by babysitter-agent tests

Invalid Transport Claims

  • Do not use transport-mux tests to prove babysitter harness:install or harness:install-plugin.
  • Do not use transport-mux tests to prove agent-native plugin manager support.
  • Do not use transport-mux tests to prove hooks-mux normalization.
  • Do not use transport-mux tests to prove Babysitter journal terminal state unless a higher-level runtime test also asserts that state.

Live Install Modes

The Publish workflow runs external-harness live E2E through a workflow-owned install-mode axis:

  • babysitter-plugin generates plugin artifacts, installs the target with amux install, installs the local Babysitter SDK, installs the Babysitter plugin for the harness, then launches through amux launch with a /babysitter:call prompt.
  • vanilla installs the target with amux install, launches through amux launch, and uses a non-plugin prompt so it proves agent-mux/transport/provider behavior; the vanilla babysitter-agent rows use the babysitter adapter with BABYSITTER_HARNESS=agent-core.
  • Both modes use the same target mapping: claude-code -> claude, codex -> codex, gemini-cli -> gemini, pi -> pi, and vanilla-only babysitter-agent -> babysitter.

Model-Backed Runtime Suite

The model-backed suite should prove that real providers and real harnesses behave like the no-model contracts expect.

TestRequired real dependencyAssertion focus
Transport-mux + external harness through agent-muxClaude Code or Codex-compatible harness, provider credential, and amux launch --with-proxy or --with-proxy-if-neededLaunch starts transport-mux, harness receives proxy env, sentinel traffic uses proxy routes, stream completes, metrics snapshot increments
Transport-mux + agent-coreProvider credential for agent-core backendAgent-core deltas/final events travel through transport-mux without adapter-only assumptions, including cancellation or timeout evidence
Agent-mux + Codex adapterCodex CLI or configured Codex runtime and OpenAI credentialCodex output maps to mux protocol events, including final message and usage metadata when available
Agent-mux + Claude Code adapterClaude Code CLI plus Foundry/OpenAI credential through transport-muxClaude Code output maps to mux protocol events while model traffic is proxied to GPT-5.5, including tool-call and stop metadata when available
Babysitter-agent full runProvider credentials or mocked backend already availablebabysitter-agent call/create-run creates a bounded process, plans, emits a task, posts a result, completes, and records selected backend evidence without running installer commands

Model-backed runtime tests must upload redacted event logs, provider/harness version metadata, run IDs, and command durations.

Runtime Path Assertions

Runtime tests must declare which entry path they exercise:

PathEntry pointValid backend combinationsAssertions
Agent-mux sessionamux run <agent> or createClient().runMock adapter, Claude, Codex, Gemini, Cursor, OpenCode, agent-mux babysitter adapter where registeredSession start/end, event ordering, provider/model config, runtime hooks, capability-gated plugin events
Babysitter-agent internal runtimebabysitter-agent call/create-run --harness agent-coreAgent-core backend with mocked or live model providerRun creation, planning, task posting, terminal state, redacted model trace
Babysitter-agent external-harness bridgebabysitter-agent call/invoke --harness <external>Harness names mapped in amuxHarnessMap; excludes pi and agent-coreAgent-mux mapped events, session ID, result, selected harness, no install commands
Transport runtimetransport-mux around agent-core or agent-mux-launched external harness trafficLocal route fixture, agent-core stream, external harness streamRoute/codec contract, proxy auth, env injection, launch proxy decision, framing, reconnect, cancellation, timeout, backpressure, metrics/cache artifact

Do not fold plugin setup into the babysitter-agent runtime assertions. If a runtime job needs an installed external harness or plugin, that is a precondition supplied by a setup job and recorded separately.

Mux-Specific Assertions

Mux tests should assert behavior that package-local unit tests cannot prove alone:

  • A session can be started, observed, cancelled, and resumed through the mux boundary.
  • Tool-call, text-delta, final-message, usage, and error events preserve ordering and session IDs.
  • Adapter-specific errors are normalized before they cross gateway or transport boundaries.
  • Model selection is explicit and recorded in the session state.
  • Credential absence is detected before provider calls are attempted.
  • Mock and live event streams conform to the same protocol fixtures.

Babysitter-Agent Whole-System Assertions

Whole-system tests for @a5c-ai/babysitter-agent should cover:

  • process loading and validation,
  • run creation,
  • session binding,
  • planning phase output shape,
  • task effect emission,
  • task result posting,
  • journal rebuild/repair compatibility,
  • terminal run state,
  • artifact and log redaction.

The no-model version should use mocks for planner and executor behavior. The model-backed version should use the smallest possible bounded process and real model credentials or a preconfigured external harness. It must not execute harness:install or harness:install-plugin as part of the babysitter-agent runtime test.

Hooks-Mux Assertions

Hooks-mux tests should cover both adapter-local behavior and end-to-end event compatibility:

  • each adapter normalizes raw harness hook payloads into the shared hook contract,
  • CLI execution preserves stdin/stdout/stderr boundaries and exit codes,
  • approval, denial, timeout, and malformed-payload cases are fixture-backed,
  • Codex and Claude Code live hook payloads can be redacted and replayed as no-model fixtures,
  • agent-mux UI and TUI approval surfaces consume the same normalized hook events.

Hooks-mux live coverage should not be promoted until the no-model fixture suite covers the same event types.

Trail

Wiki
Babysitter Docs
Testing Strategy

Agent Mux And Runtime E2E

Continue reading

Coverage And Reporting
Current Test Command Inventory
Harness And Plugin E2E
Implementation Roadmap
Mock And Fixture Contracts
Pipeline Integration
Primary Flow Data Paths
Quality Gates

Page record

Open node ledger

wiki/docs/testing/agent-mux-and-runtime-e2e.md

Documents

No documented graph nodes on this page.