II.
Page JSON
Structured · livepage:docs-testing-primary-flow-data-paths
Primary Flow Data Paths json
Inspect the normalized record payload exactly as the atlas UI reads it.
{
"id": "page:docs-testing-primary-flow-data-paths",
"_kind": "Page",
"_file": "wiki/docs/testing/primary-flow-data-paths.md",
"_cluster": "wiki",
"attributes": {
"nodeKind": "Page",
"sourcePath": "docs/testing/primary-flow-data-paths.md",
"sourceKind": "repo-docs",
"title": "Primary Flow Data Paths",
"displayName": "Primary Flow Data Paths",
"slug": "docs/testing/primary-flow-data-paths",
"articlePath": "wiki/docs/testing/primary-flow-data-paths.md",
"article": "\n# Primary Flow Data Paths\n\nThis document maps the main flows that the rebuilt E2E strategy should prove. It is intentionally data-path oriented: every flow names the caller, command/API boundary, state that must be created, hook/session artifacts that should exist, and the identifiers that let a test join evidence across packages.\n\n## Primary Configuration\n\nThe primary configuration has two valid runtime paths and one shared hook/trace layer:\n\n| Path | Primary target | What it proves | What it must not claim |\n| --- | --- | --- | --- |\n| Agent-mux plugin path | Claude Code first; Codex only when capability-gated plugin support is available | A real external harness session can be launched through `agent-mux`, the Babysitter plugin can run a `/babysitter:call`-style session command, and the resulting Babysitter run reaches a terminal state | It does not prove `babysitter-agent` runtime orchestration and does not use `babysitter-agent create-run` |\n| Babysitter-agent runtime path | `babysitter-agent call` / `babysitter-agent create-run` with `agent-core` internal backend, plus external-harness bridge where selected | The runtime can understand intent, create or reuse a process, create and bind a Babysitter SDK run, iterate effects, resolve tasks, and complete | It does not install external harness plugins; `babysitter harness:install` belongs to SDK setup, not this path |\n| Hooks and transport layer | `hooks-mux` and `transport-mux` alongside either runtime path | Native hook payloads normalize into `UnifiedHookEvent`, handlers receive traceable env/stdin, and provider traffic can be proxied/recorded where configured | Hooks-mux does not own agent-mux sessions; transport-mux does not own Babysitter run state |\n\n## Flow A: Agent-Mux Plugin Session To Babysitter Run\n\nThis is the primary plugin E2E for Claude Code. Codex uses the same shape only after an explicit capability gate proves plugin install/support for the Codex adapter.\n\n```text\noperator / CI\n -> babysitter harness:install claude\n -> babysitter harness:install-plugin claude\n -> agent-mux CLI (`amux run` or launch path)\n -> agent-mux adapter/runtime session\n -> external harness process (Claude Code primary)\n -> Babysitter plugin command inside the harness session\n -> Babysitter SDK run creation / iteration\n -> hooks-mux native hook normalization and stop-hook evidence\n -> terminal Babysitter run state and agent-mux event log evidence\n```\n\n### Data Path\n\n| Step | Boundary | Data passed | Required evidence |\n| --- | --- | --- | --- |\n| 1 | SDK setup CLI | Harness name and plugin target via `babysitter harness:install` and `babysitter harness:install-plugin` | Install JSON or log, installed plugin path, marketplace/registry entry, idempotency result |\n| 2 | Agent-mux invocation | Agent name, prompt, `--session`, `--run-id`, cwd/env/model flags from `packages/agent-mux/cli/src/commands/run.ts` | `agent-mux` run ID, selected adapter, cwd, model, prompt digest, session mode |\n| 3 | Agent-mux gateway/runtime | Session runtime and event log under `packages/agent-mux/gateway/src/runs/session-runtime.ts` and `packages/agent-mux/gateway/src/runs/event-log.ts` | Event-log file or API events with monotonic `seq`, `source`, `ts`, event type, `runId` |\n| 4 | External harness | Native harness session ID, native hook payloads, tool calls, stop/session events | Harness transcript/session ID, native hook payload fixture or redacted live artifact |\n| 5 | Babysitter plugin command | `/babysitter:call` or equivalent Babysitter-enabled session command posted in the harness | Assistant/tool transcript showing command, plugin dispatch evidence, created Babysitter `runId` |\n| 6 | SDK run loop | `run:create`, `run:iterate`, pending effects, `task:post`, terminal completion | `.a5c/runs/<runId>/`, journal/events, `tasks/<effectId>/result.json`, terminal status |\n| 7 | Hook bridge | `hooks-mux` normalizes session/tool/stop hooks and injects `AGENT_*` env | `UnifiedHookEvent`, handler stdin/stdout, `AGENT_SESSION_ID`, stop-hook result |\n\n### Assertions\n\n- The agent-mux `runId` and session ID are recorded before the Babysitter plugin command runs.\n- The Babysitter plugin command creates or resumes exactly one Babysitter `runId` for the scenario.\n- The Babysitter `runId` appears in final output and maps to an existing `.a5c/runs/<runId>/` directory.\n- At least one hook artifact proves stop/session handling, not just assistant text.\n- The final state is terminal: `RUN_COMPLETED` or equivalent completed status from the SDK run, not merely a successful model reply.\n\n## Flow B: Babysitter-Agent Runtime Create-Run\n\nThis path tests `@a5c-ai/babysitter-agent` as the runtime owner. It is separate from agent-mux plugin setup.\n\n```text\noperator / CI\n -> babysitter-agent call/create-run\n -> PhaseUnderstandIntent / PhasePlanProcess\n -> process definition in workspace `.a5c/processes` or provided `--process`\n -> Babysitter SDK `createRun`\n -> session binding for selected harness/backend\n -> PhaseOrchestration loop\n -> effect resolution through internal `agent-core` or external harness bridge\n -> SDK `commitEffectResult` / task result files\n -> terminal run completion\n```\n\n### Data Path\n\n| Step | Boundary | Data passed | Required evidence |\n| --- | --- | --- | --- |\n| 1 | `babysitter-agent` CLI | `call`, `create-run`, `yolo`, `plan`, `resume-run`; args parsed in `packages/babysitter-agent/src/cli/dispatch.ts` | Invocation command, selected harness, workspace, model, max iterations, output mode |\n| 2 | Create-run coordinator | `handleHarnessCreateRun` in `packages/babysitter-agent/src/harness/internal/createRun/index.ts` | Progress events for planning, process path, run creation, session binding |\n| 3 | Planning phase | Prompt, workspace context, selected harness, compression config | Process file path, process fingerprint or generated process report, optional planning conversation summary |\n| 4 | SDK run creation | `createRun` through `packages/sdk/src/cli/main/runCreate.ts` or SDK API | `runId`, `runDir`, process ID, entrypoint, inputs path, non-interactive metadata |\n| 5 | Session binding | Selected harness session ID from `resolveHarnessSessionIdForBinding` and SDK session state | Babysitter session ID, state file, run/session association, harness name |\n| 6 | Orchestration loop | `orchestrateIteration`, pending `EffectAction`s, `resolveEffect`, `commitEffectResult` | Iteration count, pending effect IDs, task IDs, task result refs, stdout/stderr refs |\n| 7 | Effect execution | Internal `agent-core` for internal harnesses; external bridge for external harnesses | Model/provider trace redacted, backend name, task result JSON, errors/retries if any |\n| 8 | Terminal state | SDK journal and completion proof | `RUN_COMPLETED`, final summary, completion proof only after terminal state |\n\n### Assertions\n\n- Runtime tests invoke `babysitter-agent`, not `babysitter harness:install`.\n- The selected harness/backend is recorded (`agent-core` for the internal primary path; external harness bridge only for explicit external-harness tests).\n- The created or resumed `runId` is bound to a session and appears in SDK state and final output.\n- Every pending effect has a posted result or a declared failure, keyed by `effectId`.\n- A terminal Babysitter state is the pass condition.\n\n## Flow C: SDK Run/Session Loop\n\nThis is the deterministic contract shared by both runtime paths.\n\n```text\nbabysitter run:create\n -> .a5c/runs/<runId>/ metadata + journal\n -> optional session binding through harness adapter\n -> babysitter run:iterate\n -> pending effects under tasks/<effectId>/task.json\n -> babysitter task:post\n -> result refs under tasks/<effectId>/\n -> repeated run:iterate\n -> RUN_COMPLETED / RUN_FAILED\n```\n\n### Command Boundaries\n\n| Command | Owner | State created or read | Evidence key |\n| --- | --- | --- | --- |\n| `babysitter run:create --process-id ... --entry ... --inputs ...` | SDK CLI | Run directory, run metadata, initial journal, optional session binding | `runId`, `runDir`, `entry`, `processId`, `session.sessionId` |\n| `babysitter session:init --session-id ...` | SDK CLI | Session state file | `stateFile`, iteration, max iterations |\n| `babysitter session:associate --session-id ... --run-id ...` | SDK CLI | Session file updated with run ID | `stateFile`, `runId` |\n| `babysitter run:iterate <runDir>` | SDK CLI/runtime | Replayed state, emitted effects, terminal events | `iteration`, `status`, `nextActions[].effectId` |\n| `babysitter task:list <runDir> --pending` | SDK CLI/runtime | Pending task index | `effectId`, `taskId`, `stepId`, `kind`, `taskDefRef` |\n| `babysitter task:post <runDir> <effectId> --status ok --value <file>` | SDK CLI/runtime | Task result, stdout/stderr refs, effect resolution journal event | `effectId`, `resultRef`, status |\n\n## Flow D: Hooks-Mux Native Hook Path\n\nHooks-mux is the canonical hook-normalization and handler fan-out layer.\n\n```text\nnative harness hook payload on stdin\n -> `a5c-hooks-mux bootstrap` or `a5c-hooks-mux invoke`\n -> adapter loader (the matching hooks-mux adapter package for the selected harness)\n -> adapter normalizer\n -> `UnifiedHookEvent`\n -> handler plan + child-process handlers\n -> merged hook result\n -> session env/context persistence\n -> native renderer output back to harness\n```\n\n### Data Path\n\n| Step | Boundary | Data passed | Required evidence |\n| --- | --- | --- | --- |\n| 1 | Native hook | Claude/Codex/Gemini/etc. JSON stdin and native event name | Raw hook payload fixture or redacted live payload |\n| 2 | CLI entry | `bootstrap`, `invoke`, or `exec` in `packages/hooks-mux/cli/src/cli/commands` | CLI args, adapter name, native event, explicit session override if any |\n| 3 | Adapter load | `loadAdapter` resolves package and capabilities | Adapter name, capability JSON, phase mappings |\n| 4 | Normalize | Adapter builds `UnifiedHookEvent` from `packages/hooks-mux/core/src/types/event.ts` | `version`, `adapter`, `phase`, `rawEventName`, `supportLevel`, `execution.*` |\n| 5 | Handler execution | `runPlan` injects event on stdin and context env into child handlers | Handler command, `HOOKS_PROXY_EVENT`, `AGENT_SESSION_ID`, `AGENT_ADAPTER`, timeout/result |\n| 6 | Merge and persist | Merge result updates session persisted env/context vars | `persistEnv`, `contextVars`, `unsetEnv`, session file diff |\n| 7 | Render | Adapter renderer writes native hook output | Native decision/output JSON and dropped/degraded fields |\n\n### Assertions\n\n- Tests assert both raw native event and canonical `phase`.\n- `UnifiedHookEvent.execution.sessionId` matches the session used by agent-mux or Babysitter where the flow crosses that boundary.\n- Stop-hook tests assert recursion guard/stop behavior explicitly.\n- Handler env contains `AGENT_SESSION_ID` and `AGENT_ADAPTER`; sensitive provider keys are redacted from artifacts.\n\n## Flow E: Transport-Mux Assisted Agent-Mux Launch\n\nTransport-mux belongs to provider/proxy transport, not Babysitter run state. The primary E2E use is to prove that an agent-mux launch can route provider traffic through a configured transport proxy and still complete a model-backed session.\n\n```text\nagent-mux launch/run\n -> launch decision: native provider vs transport-mux proxy\n -> transport-mux HTTP/SSE route\n -> upstream provider or mock transport\n -> streamed/non-streamed response\n -> agent-mux session event log\n -> optional hooks-mux events from harness runtime\n```\n\n### Assertions\n\n- Agent-mux launch evidence includes `proxyNeeded`/`proxyReason` or equivalent launch decision metadata.\n- Transport evidence includes route, upstream target, status code, stream completion/cancellation, timeout behavior, and redacted auth metadata.\n- The transport trace is correlated to an agent-mux `runId` or session ID.\n- The transport test does not claim Babysitter completion unless a Babysitter run ID and terminal SDK state are also present.\n\n## Valid Primary Test Set\n\n| ID | Flow | Lane | Minimum proof |\n| --- | --- | --- | --- |\n| PF-1 | SDK run/session loop | No-model | Create run, list pending task, post result, complete run, inspect journal |\n| PF-2 | Hooks-mux Claude fixture | No-model | Session/tool/stop hook fixtures normalize and render; handler env contains trace IDs |\n| PF-3 | Hooks-mux Codex fixture | No-model | Session/tool aliases normalize, lossy/native support levels match mapping, handler env is present |\n| PF-4 | Agent-mux mock session | No-model | `runId`, session event log, ordered events, terminal session output |\n| PF-5 | Transport-mux mock route | No-model | Proxy route roundtrip, stream and non-stream artifacts, timeout/cancel fixture |\n| PF-6 | Babysitter-agent internal | Model-backed or controlled fake model | `babysitter-agent call/create-run`, `agent-core` backend, SDK run terminal state |\n| PF-7 | Agent-mux + Claude + Babysitter plugin | Model-backed | Harness/plugin installed, `/babysitter:call`, agent-mux session log, SDK run terminal state, stop hook evidence |\n| PF-8 | Agent-mux + Codex + Babysitter plugin | Capability-gated model-backed | Same as PF-7 only after plugin support is proven; otherwise skip evidence must cite capability gate |\n| PF-9 | Agent-mux + transport-mux live stream | Model-backed | Launch decision, proxy trace, streamed response, agent-mux session completion |\n\n## Source Map\n\n| Area | Source files to inspect first |\n| --- | --- |\n| Agent-mux CLI and sessions | `packages/agent-mux/cli/src/commands/run.ts`, `packages/agent-mux/cli/src/commands/launch.ts`, `packages/agent-mux/gateway/src/runs/session-runtime.ts`, `packages/agent-mux/gateway/src/runs/event-log.ts` |\n| Babysitter-agent runtime | `packages/babysitter-agent/src/cli/dispatch.ts`, `packages/babysitter-agent/src/cli/commands/harness/createRun.ts`, `packages/babysitter-agent/src/harness/internal/createRun/index.ts`, `packages/babysitter-agent/src/harness/internal/createRun/orchestration/effects.ts` |\n| SDK run/session loop | `packages/sdk/src/cli/main/runCreate.ts`, `packages/sdk/src/cli/main/taskCommands.ts`, `packages/sdk/src/cli/commands/session/init.ts`, `packages/sdk/src/cli/commands/session/associate.ts` |\n| Hooks-mux | `packages/hooks-mux/cli/src/cli/commands/invoke.ts`, `packages/hooks-mux/cli/src/cli/bootstrap-runtime.ts`, `packages/hooks-mux/core/src/types/event.ts`, `packages/hooks-mux/core/src/normalizer/runner.ts` |\n| Transport-mux | `packages/transport-mux/src/index.ts`, `packages/transport-mux/tests/e2e/http-roundtrip.test.ts`, `packages/transport-mux/tests/runtime.test.ts` |\n",
"documents": []
},
"outgoingEdges": [],
"incomingEdges": []
}