docs/testing/primary-flow-data-paths
Primary Flow Data Paths reference
This document maps the main flows that the rebuilt E2E strategy should prove. It is intentionally data-path oriented: every flow names the caller, command/API boundary, state that must be created, hook/session artifacts that should exist, and the identifiers that let a test join evidence across packages.
Continue reading
Nearby pages in the same section.
Primary Flow Data Paths
This document maps the main flows that the rebuilt E2E strategy should prove. It is intentionally data-path oriented: every flow names the caller, command/API boundary, state that must be created, hook/session artifacts that should exist, and the identifiers that let a test join evidence across packages.
Primary Configuration
The primary configuration has two valid runtime paths and one shared hook/trace layer:
| Path | Primary target | What it proves | What it must not claim |
|---|---|---|---|
| Agent-mux plugin path | Claude Code first; Codex only when capability-gated plugin support is available | A real external harness session can be launched through agent-mux, the Babysitter plugin can run a /babysitter:call-style session command, and the resulting Babysitter run reaches a terminal state | It does not prove babysitter-agent runtime orchestration and does not use babysitter-agent create-run |
| Babysitter-agent runtime path | babysitter-agent call / babysitter-agent create-run with agent-core internal backend, plus external-harness bridge where selected | The runtime can understand intent, create or reuse a process, create and bind a Babysitter SDK run, iterate effects, resolve tasks, and complete | It does not install external harness plugins; babysitter harness:install belongs to SDK setup, not this path |
| Hooks and transport layer | hooks-mux and transport-mux alongside either runtime path | Native hook payloads normalize into UnifiedHookEvent, handlers receive traceable env/stdin, and provider traffic can be proxied/recorded where configured | Hooks-mux does not own agent-mux sessions; transport-mux does not own Babysitter run state |
Flow A: Agent-Mux Plugin Session To Babysitter Run
This is the primary plugin E2E for Claude Code. Codex uses the same shape only after an explicit capability gate proves plugin install/support for the Codex adapter.
operator / CI
-> babysitter harness:install claude
-> babysitter harness:install-plugin claude
-> agent-mux CLI (`amux run` or launch path)
-> agent-mux adapter/runtime session
-> external harness process (Claude Code primary)
-> Babysitter plugin command inside the harness session
-> Babysitter SDK run creation / iteration
-> hooks-mux native hook normalization and stop-hook evidence
-> terminal Babysitter run state and agent-mux event log evidenceData Path
| Step | Boundary | Data passed | Required evidence |
|---|---|---|---|
| 1 | SDK setup CLI | Harness name and plugin target via babysitter harness:install and babysitter harness:install-plugin | Install JSON or log, installed plugin path, marketplace/registry entry, idempotency result |
| 2 | Agent-mux invocation | Agent name, prompt, --session, --run-id, cwd/env/model flags from packages/agent-mux/cli/src/commands/run.ts | agent-mux run ID, selected adapter, cwd, model, prompt digest, session mode |
| 3 | Agent-mux gateway/runtime | Session runtime and event log under packages/agent-mux/gateway/src/runs/session-runtime.ts and packages/agent-mux/gateway/src/runs/event-log.ts | Event-log file or API events with monotonic seq, source, ts, event type, runId |
| 4 | External harness | Native harness session ID, native hook payloads, tool calls, stop/session events | Harness transcript/session ID, native hook payload fixture or redacted live artifact |
| 5 | Babysitter plugin command | /babysitter:call or equivalent Babysitter-enabled session command posted in the harness | Assistant/tool transcript showing command, plugin dispatch evidence, created Babysitter runId |
| 6 | SDK run loop | run:create, run:iterate, pending effects, task:post, terminal completion | .a5c/runs/<runId>/, journal/events, tasks/<effectId>/result.json, terminal status |
| 7 | Hook bridge | hooks-mux normalizes session/tool/stop hooks and injects AGENT_* env | UnifiedHookEvent, handler stdin/stdout, AGENT_SESSION_ID, stop-hook result |
Assertions
- The agent-mux
runIdand session ID are recorded before the Babysitter plugin command runs. - The Babysitter plugin command creates or resumes exactly one Babysitter
runIdfor the scenario. - The Babysitter
runIdappears in final output and maps to an existing.a5c/runs/<runId>/directory. - At least one hook artifact proves stop/session handling, not just assistant text.
- The final state is terminal:
RUN_COMPLETEDor equivalent completed status from the SDK run, not merely a successful model reply.
Flow B: Babysitter-Agent Runtime Create-Run
This path tests @a5c-ai/babysitter-agent as the runtime owner. It is separate from agent-mux plugin setup.
operator / CI
-> babysitter-agent call/create-run
-> PhaseUnderstandIntent / PhasePlanProcess
-> process definition in workspace `.a5c/processes` or provided `--process`
-> Babysitter SDK `createRun`
-> session binding for selected harness/backend
-> PhaseOrchestration loop
-> effect resolution through internal `agent-core` or external harness bridge
-> SDK `commitEffectResult` / task result files
-> terminal run completionData Path
| Step | Boundary | Data passed | Required evidence |
|---|---|---|---|
| 1 | babysitter-agent CLI | call, create-run, yolo, plan, resume-run; args parsed in packages/babysitter-agent/src/cli/dispatch.ts | Invocation command, selected harness, workspace, model, max iterations, output mode |
| 2 | Create-run coordinator | handleHarnessCreateRun in packages/babysitter-agent/src/harness/internal/createRun/index.ts | Progress events for planning, process path, run creation, session binding |
| 3 | Planning phase | Prompt, workspace context, selected harness, compression config | Process file path, process fingerprint or generated process report, optional planning conversation summary |
| 4 | SDK run creation | createRun through packages/sdk/src/cli/main/runCreate.ts or SDK API | runId, runDir, process ID, entrypoint, inputs path, non-interactive metadata |
| 5 | Session binding | Selected harness session ID from resolveHarnessSessionIdForBinding and SDK session state | Babysitter session ID, state file, run/session association, harness name |
| 6 | Orchestration loop | orchestrateIteration, pending EffectActions, resolveEffect, commitEffectResult | Iteration count, pending effect IDs, task IDs, task result refs, stdout/stderr refs |
| 7 | Effect execution | Internal agent-core for internal harnesses; external bridge for external harnesses | Model/provider trace redacted, backend name, task result JSON, errors/retries if any |
| 8 | Terminal state | SDK journal and completion proof | RUN_COMPLETED, final summary, completion proof only after terminal state |
Assertions
- Runtime tests invoke
babysitter-agent, notbabysitter harness:install. - The selected harness/backend is recorded (
agent-corefor the internal primary path; external harness bridge only for explicit external-harness tests). - The created or resumed
runIdis bound to a session and appears in SDK state and final output. - Every pending effect has a posted result or a declared failure, keyed by
effectId. - A terminal Babysitter state is the pass condition.
Flow C: SDK Run/Session Loop
This is the deterministic contract shared by both runtime paths.
babysitter run:create
-> .a5c/runs/<runId>/ metadata + journal
-> optional session binding through harness adapter
-> babysitter run:iterate
-> pending effects under tasks/<effectId>/task.json
-> babysitter task:post
-> result refs under tasks/<effectId>/
-> repeated run:iterate
-> RUN_COMPLETED / RUN_FAILEDCommand Boundaries
| Command | Owner | State created or read | Evidence key |
|---|---|---|---|
babysitter run:create --process-id ... --entry ... --inputs ... | SDK CLI | Run directory, run metadata, initial journal, optional session binding | runId, runDir, entry, processId, session.sessionId |
babysitter session:init --session-id ... | SDK CLI | Session state file | stateFile, iteration, max iterations |
babysitter session:associate --session-id ... --run-id ... | SDK CLI | Session file updated with run ID | stateFile, runId |
babysitter run:iterate <runDir> | SDK CLI/runtime | Replayed state, emitted effects, terminal events | iteration, status, nextActions[].effectId |
babysitter task:list <runDir> --pending | SDK CLI/runtime | Pending task index | effectId, taskId, stepId, kind, taskDefRef |
babysitter task:post <runDir> <effectId> --status ok --value <file> | SDK CLI/runtime | Task result, stdout/stderr refs, effect resolution journal event | effectId, resultRef, status |
Flow D: Hooks-Mux Native Hook Path
Hooks-mux is the canonical hook-normalization and handler fan-out layer.
native harness hook payload on stdin
-> `a5c-hooks-mux bootstrap` or `a5c-hooks-mux invoke`
-> adapter loader (the matching hooks-mux adapter package for the selected harness)
-> adapter normalizer
-> `UnifiedHookEvent`
-> handler plan + child-process handlers
-> merged hook result
-> session env/context persistence
-> native renderer output back to harnessData Path
| Step | Boundary | Data passed | Required evidence |
|---|---|---|---|
| 1 | Native hook | Claude/Codex/Gemini/etc. JSON stdin and native event name | Raw hook payload fixture or redacted live payload |
| 2 | CLI entry | bootstrap, invoke, or exec in packages/hooks-mux/cli/src/cli/commands | CLI args, adapter name, native event, explicit session override if any |
| 3 | Adapter load | loadAdapter resolves package and capabilities | Adapter name, capability JSON, phase mappings |
| 4 | Normalize | Adapter builds UnifiedHookEvent from packages/hooks-mux/core/src/types/event.ts | version, adapter, phase, rawEventName, supportLevel, execution.* |
| 5 | Handler execution | runPlan injects event on stdin and context env into child handlers | Handler command, HOOKS_PROXY_EVENT, AGENT_SESSION_ID, AGENT_ADAPTER, timeout/result |
| 6 | Merge and persist | Merge result updates session persisted env/context vars | persistEnv, contextVars, unsetEnv, session file diff |
| 7 | Render | Adapter renderer writes native hook output | Native decision/output JSON and dropped/degraded fields |
Assertions
- Tests assert both raw native event and canonical
phase. UnifiedHookEvent.execution.sessionIdmatches the session used by agent-mux or Babysitter where the flow crosses that boundary.- Stop-hook tests assert recursion guard/stop behavior explicitly.
- Handler env contains
AGENT_SESSION_IDandAGENT_ADAPTER; sensitive provider keys are redacted from artifacts.
Flow E: Transport-Mux Assisted Agent-Mux Launch
Transport-mux belongs to provider/proxy transport, not Babysitter run state. The primary E2E use is to prove that an agent-mux launch can route provider traffic through a configured transport proxy and still complete a model-backed session.
agent-mux launch/run
-> launch decision: native provider vs transport-mux proxy
-> transport-mux HTTP/SSE route
-> upstream provider or mock transport
-> streamed/non-streamed response
-> agent-mux session event log
-> optional hooks-mux events from harness runtimeAssertions
- Agent-mux launch evidence includes
proxyNeeded/proxyReasonor equivalent launch decision metadata. - Transport evidence includes route, upstream target, status code, stream completion/cancellation, timeout behavior, and redacted auth metadata.
- The transport trace is correlated to an agent-mux
runIdor session ID. - The transport test does not claim Babysitter completion unless a Babysitter run ID and terminal SDK state are also present.
Valid Primary Test Set
| ID | Flow | Lane | Minimum proof |
|---|---|---|---|
| PF-1 | SDK run/session loop | No-model | Create run, list pending task, post result, complete run, inspect journal |
| PF-2 | Hooks-mux Claude fixture | No-model | Session/tool/stop hook fixtures normalize and render; handler env contains trace IDs |
| PF-3 | Hooks-mux Codex fixture | No-model | Session/tool aliases normalize, lossy/native support levels match mapping, handler env is present |
| PF-4 | Agent-mux mock session | No-model | runId, session event log, ordered events, terminal session output |
| PF-5 | Transport-mux mock route | No-model | Proxy route roundtrip, stream and non-stream artifacts, timeout/cancel fixture |
| PF-6 | Babysitter-agent internal | Model-backed or controlled fake model | babysitter-agent call/create-run, agent-core backend, SDK run terminal state |
| PF-7 | Agent-mux + Claude + Babysitter plugin | Model-backed | Harness/plugin installed, /babysitter:call, agent-mux session log, SDK run terminal state, stop hook evidence |
| PF-8 | Agent-mux + Codex + Babysitter plugin | Capability-gated model-backed | Same as PF-7 only after plugin support is proven; otherwise skip evidence must cite capability gate |
| PF-9 | Agent-mux + transport-mux live stream | Model-backed | Launch decision, proxy trace, streamed response, agent-mux session completion |
Source Map
| Area | Source files to inspect first |
|---|---|
| Agent-mux CLI and sessions | packages/agent-mux/cli/src/commands/run.ts, packages/agent-mux/cli/src/commands/launch.ts, packages/agent-mux/gateway/src/runs/session-runtime.ts, packages/agent-mux/gateway/src/runs/event-log.ts |
| Babysitter-agent runtime | packages/babysitter-agent/src/cli/dispatch.ts, packages/babysitter-agent/src/cli/commands/harness/createRun.ts, packages/babysitter-agent/src/harness/internal/createRun/index.ts, packages/babysitter-agent/src/harness/internal/createRun/orchestration/effects.ts |
| SDK run/session loop | packages/sdk/src/cli/main/runCreate.ts, packages/sdk/src/cli/main/taskCommands.ts, packages/sdk/src/cli/commands/session/init.ts, packages/sdk/src/cli/commands/session/associate.ts |
| Hooks-mux | packages/hooks-mux/cli/src/cli/commands/invoke.ts, packages/hooks-mux/cli/src/cli/bootstrap-runtime.ts, packages/hooks-mux/core/src/types/event.ts, packages/hooks-mux/core/src/normalizer/runner.ts |
| Transport-mux | packages/transport-mux/src/index.ts, packages/transport-mux/tests/e2e/http-roundtrip.test.ts, packages/transport-mux/tests/runtime.test.ts |