docs/testing/trace-identifiers-and-evidence
Trace Identifiers And Evidence reference
Use this document as the evidence checklist for tests described in Primary Flow Data Paths(./primary-flow-data-paths.md). A scenario should not be marked E2E unless it records the identifiers needed to join the agent session, hook events, Babysitter run state, and transport trace.
Continue reading
Nearby pages in the same section.
Trace Identifiers And Evidence
Use this document as the evidence checklist for tests described in Primary Flow Data Paths. A scenario should not be marked E2E unless it records the identifiers needed to join the agent session, hook events, Babysitter run state, and transport trace.
Identifier Spine
| Identifier | Owner | Where it appears | Why it matters |
|---|---|---|---|
agentMuxRunId / runId | Agent-mux | CLI result, gateway runtime state, event log filename or event body | Joins agent-mux session events to launch/transport evidence |
agentMuxSessionId / sessionId | Agent-mux/external harness | CLI args, session runtime, harness transcript | Proves continuity across prompts, plugin command, and hook events |
babysitterRunId / SDK runId | Babysitter SDK and babysitter-agent | run:create output, .a5c/runs/<runId>/, babysitter-agent progress events | Primary key for SDK journal, tasks, and terminal state |
runDir | Babysitter SDK | run:create output, babysitter-agent progress events | Filesystem root for journal, tasks, outputs, and replay state |
babysitterSessionId | SDK session binding or harness adapter | session:init, session:associate, run-create session block, hooks env | Joins harness session to SDK run loop |
effectId | Babysitter SDK | run:iterate next actions, task:list, task:post, tasks/<effectId>/ | Joins requested work to posted results |
taskId / stepId | Babysitter process runtime | task:list, task definition refs | Names process step semantics independently of generated effect ID |
UnifiedHookEvent.execution.sessionId | Hooks-mux | Normalized hook event JSON | Joins native hook event to agent or Babysitter session |
UnifiedHookEvent.execution.toolCallId | Hooks-mux/native harness | Tool hook payloads and normalized event | Joins tool call ready/result pairs and handler decisions |
event.seq | Agent-mux gateway event log | packages/agent-mux/gateway/src/runs/event-log.ts event entries | Orders session events and detects gaps/truncation |
| Transport request/trace ID | Transport-mux | Proxy request logs, trace query/headers, upstream metadata | Joins provider request/stream to agent-mux launch/session |
Environment And Hook Context
| Variable or payload field | Produced by | Consumed by | Required assertion |
|---|---|---|---|
AGENT_SESSION_ID | Hooks-mux bootstrap/session persistence or SDK harness adapter | Hook handlers, child commands, SDK session binding | Equals the scenario session ID and is stable across hook invocations |
AGENT_ADAPTER | Hooks-mux normalized execution context | Hook handlers and trace artifacts | Equals selected adapter such as claude, codex, or gemini |
AGENT_WORKSPACE_ROOT | Hooks-mux execution context | Hook handlers and subprocesses | Equals expected workspace/cwd |
AGENT_TRANSCRIPT_PATH | Harness-native payload where available | Hook handlers and evidence collector | Points to redacted transcript artifact when available |
AGENT_CAPABILITIES_JSON | Hooks-mux handler runner | Hook handlers | Captures adapter capability gate decisions |
HOOKS_PROXY_EVENT | Hooks-mux handler runner | Hook handlers | JSON equals the normalized event given on stdin |
CLAUDE_ENV_FILE | Claude native hook environment | Hooks-mux propagation backend | Contains exported persisted env after bootstrap or handler result |
HOOKS_PROXY_ENV_FILE | Generic hooks-mux env propagation | Hooks-mux propagation backend | Contains persisted env when native env file is not provider-specific |
HOOKS_PROXY_SESSION_ID | Adapter enrichment/fallback | Normalizer | Matches native session ID when adapter enriches env from stdin |
HOOKS_PROXY_TOOL_NAME / HOOKS_PROXY_TOOL_CALL_ID | Adapter enrichment | Normalizer/handler env | Matches native tool payload values |
Evidence Bundles By Flow
Agent-Mux Plugin Path
A passing artifact bundle should include:
agent-muxinvocation: command, selected adapter, model, cwd, prompt digest,runId, session mode.- Agent-mux event log: ordered
seq,ts,source, event type, session/run IDs, terminal event. - Harness/plugin setup:
babysitter harness:install <harness>andbabysitter harness:install-plugin <harness>output or a cached precondition artifact. - Plugin command transcript: user command such as
/babysitter:call, plugin dispatch evidence, assistant/tool result. - Babysitter SDK run evidence:
runId,runDir,run:iterateoutput,task:list,task:post, terminal journal state. - Hook evidence: normalized session/tool/stop event, stop-hook decision, handler env snapshot with secrets redacted.
Babysitter-Agent Runtime Path
A passing artifact bundle should include:
babysitter-agent callorbabysitter-agent create-runcommand and parsed options.- Progress events for planning/process path, run creation, session binding, iteration start, effect resolution, and completion.
- Selected harness/backend:
agent-corefor internal primary tests, external harness name for bridge tests. - Generated/provided process path and process fingerprint or file digest.
- SDK
runId,runDir, session binding result, pending effects, posted task results, terminal state. - Redacted model/provider trace for model-backed runs, or mock transcript for no-model runs.
SDK Run/Session Loop
A passing artifact bundle should include:
babysitter run:create --jsonoutput withrunId,runDir,entry,processId, and session block if bound..a5c/runs/<runId>/file listing or archived subset: metadata, journal/events, tasks.babysitter run:iterate --jsonoutputs for each iteration.babysitter task:list --pending --jsonbefore each post.babysitter task:post --jsonoutput for everyeffectIdresolved by the test.- Final
run:statusor terminal journal event proving completion/failure.
Hooks-Mux Path
A passing artifact bundle should include:
- Raw native hook fixture or redacted live stdin payload.
- CLI command:
a5c-hooks-mux bootstrapora5c-hooks-mux invoke --adapter <name> --native-event <event>. - Adapter capabilities and mapping support level (
native,lossy,unsupported). - Normalized
UnifiedHookEventwithadapter,phase,rawEventName,supportLevel, andexecutionfields. - Handler plan and child-process result; include stdout/stderr and timeout status.
- Merged hook result, persisted env/context diff, and native renderer output.
Transport-Mux Path
A passing artifact bundle should include:
- Agent-mux launch decision: native provider vs transport proxy,
proxyNeeded, reason, route, and redacted env diff. - Transport-mux route request: method, path, query/trace flag, upstream target, status code.
- Stream evidence: first byte/event, at least one delta, final event, cancellation/timeout case where applicable.
- Correlation to agent-mux
runIdor session ID. - Explicit statement that Babysitter completion is out of scope unless a
babysitterRunIdand SDK terminal state are also present.
Redaction Rules
- Never store provider API keys, OAuth tokens, cookies, or raw auth headers.
- Store model/provider names, endpoint family, status code, request shape, token counts, and timing metadata only after redaction.
- Prompt/transcript artifacts may store prompt digests and bounded excerpts; full live transcripts require a fixture-safe redaction pass.
- Hook env snapshots must include
AGENT_*andHOOKS_PROXY_*correlation variables but remove credential variables.
Failure Classification
| Failure class | Example | How to report |
|---|---|---|
| Setup failure | Harness/plugin install fails | Mark setup lane failed; do not claim runtime E2E attempted |
| Capability skip | Codex plugin manager unsupported | Mark skipped with adapter capability artifact |
| Session correlation failure | Hook event session ID differs from agent-mux session ID | Fail E2E and attach both IDs plus raw/normalized hook evidence |
| SDK run failure | run:iterate emits RUN_FAILED | Fail Babysitter run path; attach journal and last effect result |
| Hook normalization failure | Native event maps to wrong phase/support level | Fail hooks-mux lane; attach raw payload and UnifiedHookEvent |
| Transport failure | Proxy stream times out or loses final event | Fail transport lane; attach route trace and agent-mux session state |
| Provider failure | Live model returns auth/quota error | Mark model-backed infra failure; keep no-model lane separate |
Minimal Artifact Naming
Use deterministic artifact names so CI and local runs can be compared:
| Artifact | Suggested name |
|---|---|
| Agent-mux event log | agent-mux-events-<agentMuxRunId>.ndjson |
| Babysitter run summary | babysitter-run-<babysitterRunId>.json |
| Babysitter task bundle | babysitter-tasks-<babysitterRunId>.json |
| Hook normalized event | hooks-mux-<adapter>-<nativeEvent>-<sessionId>.json |
| Hook handler result | hooks-mux-handler-<effect-or-tool-id>.json |
| Transport trace | transport-mux-trace-<agentMuxRunId>.json |
| Redaction report | redaction-report-<scenario-id>.json |
Scenario Completion Checklist
Before a scenario is labeled complete, verify:
- [ ] The primary path is declared: agent-mux plugin, babysitter-agent runtime, SDK run loop, hooks-mux fixture, or transport-mux route.
- [ ] All required identifiers for that path are present and joinable.
- [ ] The terminal condition is owned by the correct layer.
- [ ] Any capability gate or model credential requirement is explicit.
- [ ] Redaction completed before artifacts are uploaded.
- [ ] The scenario names which permutation IDs from Stack Permutations and which primary flow IDs from Primary Flow Data Paths it covers.