II.
Page overview
Reference · livepage:docs-assimilation-harness-generic-harness-guide
Generic Harness Integration Guide for Babysitter SDK overview
Inspect the raw attributes, linked wiki pages, and inbound or outbound graph edges for page:docs-assimilation-harness-generic-harness-guide.
Attributes
nodeKind
Page
sourcePath
docs/assimilation/harness/generic-harness-guide.md
sourceKind
repo-docs
title
Generic Harness Integration Guide for Babysitter SDK
displayName
Generic Harness Integration Guide for Babysitter SDK
slug
docs/assimilation/harness/generic-harness-guide
articlePath
wiki/docs/assimilation/harness/generic-harness-guide.md
article
# Generic Harness Integration Guide for Babysitter SDK
A step-by-step implementation guide for integrating the babysitter SDK orchestration
loop into any AI coding harness. This document is harness-agnostic and uses pseudocode
throughout. For the canonical reference implementation, see
[Claude Code Integration](./claude-code-integration.md).
---
## Table of Contents
1. [Prerequisites](#1-prerequisites)
2. [Core Integration Points](#2-core-integration-points)
- [2a. SDK Installation](#2a-sdk-installation)
- [2b. Session Initialization](#2b-session-initialization)
- [2c. Run Creation and Session Binding](#2c-run-creation-and-session-binding)
- [2d. The Orchestration Loop Driver](#2d-the-orchestration-loop-driver)
- [2e. Effect Execution](#2e-effect-execution)
- [2f. Result Posting](#2f-result-posting)
- [2g. Iteration Guards](#2g-iteration-guards)
3. [Harness Capability Matrix](#3-harness-capability-matrix)
4. [Session State Contract](#4-session-state-contract)
5. [Hook Equivalence Table](#5-hook-equivalence-table)
6. [CLI Output Schemas](#6-cli-output-schemas)
7. [CLI Error Handling](#7-cli-error-handling)
8. [Edge Cases](#8-edge-cases)
9. [Testing the Integration](#9-testing-the-integration)
10. [Reference Implementation](#10-reference-implementation)
---
## 1. Prerequisites
Your harness must provide (or be able to emulate) the following capabilities before
you begin integration. Each item is marked as REQUIRED or RECOMMENDED.
### Checklist
- [ ] **REQUIRED: Shell or script execution** -- The harness must be able to execute
shell commands (`bash`, `sh`, `cmd`) or invoke Node.js scripts. The babysitter CLI
is a Node.js binary invoked via shell. Every integration point depends on running
`babysitter <command>` and reading its JSON output from stdout.
- [ ] **REQUIRED: Exit/stop interception** -- The harness must provide a mechanism to
intercept the AI agent's attempt to end its turn or exit the conversation. This is
the single most critical requirement. Without it, the orchestration loop cannot
function. Examples:
- A "stop hook" that fires before the agent's response is finalized
- A middleware layer that can reject an exit signal and re-inject context
- A wrapper around the agent loop that checks a condition before allowing termination
- [ ] **REQUIRED: Context re-injection** -- After blocking an exit, the harness must
be able to inject new text (a system message, user message, or tool result) into the
agent's context so it continues working. The injected content comes from the
babysitter CLI output.
- [ ] **REQUIRED: Session/conversation identity** -- The harness must provide a stable
identifier for the current session or conversation. This ID is used to:
- Name the session state file
- Associate the session with a babysitter run
- Track iteration count across stop-hook cycles
- [ ] **RECOMMENDED: Lifecycle hooks** -- Pre-session, post-session, pre-turn,
post-turn hooks simplify integration. If unavailable, equivalent behavior can be
built by wrapping the agent's main loop.
- [ ] **RECOMMENDED: Transcript access** -- Access to the agent's recent output text
enables completion proof verification (scanning for `<promise>` tags). If
unavailable, an alternative proof mechanism must be implemented (see Section 2d).
- [ ] **RECOMMENDED: Persistent environment** -- Environment variables or a key-value
store that persists across hook invocations within the same session. Used to carry
the session ID and plugin root path.
### Minimum Environment
```
Node.js >= 18
npm >= 8
File system access (read/write to working directory)
```
---
## 2. Core Integration Points
Implement these in order. Each section includes a checklist, pseudocode, and the
specific CLI commands involved.
```
+-----------------------------------------------------------------------+
| YOUR HARNESS |
| |
| +------------------+ +-----------------------+ +----------------+ |
| | Session Lifecycle | | Exit/Stop Interceptor | | Agent Loop | |
| | (start/end) | | (block or approve) | | (LLM turns) | |
| +--------+---------+ +-----------+-----------+ +-------+--------+ |
| | | | |
+-----------------------------------------------------------------------+
| | |
v v v
+-------------------+ +-------------------+ +------------------+
| babysitter CLI | | Session State | | Run Directory |
| (npm package) | | {stateDir}/ | | .a5c/runs/{id}/ |
| | | {sessionId}.md | | journal/ |
| session:init | +-------------------+ | tasks/ |
| run:create | | state/ |
| run:assign-process| +------------------+
| session:associate |
| run:iterate |
| task:list |
| task:post |
| session:check- |
| iteration |
| session:iteration-|
| message |
| run:status |
+-------------------+
```
---
### 2a. SDK Installation
**Goal:** Ensure the `babysitter` CLI binary is available on PATH.
#### Checklist
- [ ] Determine the SDK version to install (from plugin manifest or pinned version)
- [ ] Attempt global install; fall back to local prefix install; fall back to npx
- [ ] Gate installation behind a marker file to avoid repeated install attempts
- [ ] Verify the CLI is callable: `babysitter version --json`
#### Pseudocode
```
function ensureBabysitterCLI(sdkVersion):
markerFile = "{pluginRoot}/.babysitter-install-attempted"
if commandExists("babysitter"):
return "babysitter"
if fileExists(markerFile):
// Already tried installing; fall through to npx
else:
// Attempt global install
result = shell("npm install -g @a5c-ai/babysitter-sdk@{sdkVersion}")
if result.exitCode != 0:
// Fallback: install with local prefix
result = shell("npm install -g @a5c-ai/babysitter-sdk@{sdkVersion} --prefix $HOME/.local")
writeFile(markerFile, "attempted")
if commandExists("babysitter"):
return "babysitter"
// Final fallback: use npx on every invocation
return "npx -y @a5c-ai/babysitter-sdk@{sdkVersion} babysitter"
```
#### CLI Command
```bash
# Verify installation
babysitter version --json
# Expected: { "version": "x.y.z", "sdkVersion": "..." }
```
---
### 2b. Session Initialization
**Goal:** Create a baseline session state file so the orchestration loop can track
iterations from the very start of the session, even before any run is created.
#### When to Call
At session/conversation start -- before the user has issued any commands. This is
typically wired into a "session start" lifecycle hook or called at the top of the
agent's main loop.
#### Checklist
- [ ] Obtain or generate a unique session ID
- [ ] Determine the state directory (typically `{pluginRoot}/skills/babysit/state/`)
- [ ] Call `babysitter session:init`
- [ ] Persist the session ID in the harness environment for later hook invocations
#### Pseudocode
```
function onSessionStart(sessionId, pluginRoot):
stateDir = "{pluginRoot}/skills/babysit/state"
ensureDirectoryExists(stateDir)
result = shell(
"babysitter session:init" +
" --session-id {sessionId}" +
" --state-dir {stateDir}" +
" --json"
)
if result.exitCode != 0:
log("WARNING: session init failed, orchestration may not work")
return
// Persist session ID for use by the stop interceptor
setEnv("AGENT_SESSION_ID", sessionId)
setEnv("BABYSITTER_PLUGIN_ROOT", pluginRoot)
```
#### What This Creates
A session state file at `{stateDir}/{sessionId}.md` in BASELINE state (empty
`run_id`, `iteration: 1`). See [Section 4: Session State Contract](#4-session-state-contract)
for the full file format, field definitions, and state transition diagram.
---
### 2c. Run Creation and Session Binding
**Goal:** Create a babysitter run and bind it to the current session so the stop
interceptor knows which run to check.
#### When to Call
After the user requests a task that should be orchestrated. Typically triggered by a
skill or command within the harness (e.g., the user says "babysit this task").
#### Checklist
- [ ] Prepare the process definition (entry point, process ID, inputs)
- [ ] Call `babysitter run:create` with harness and session parameters
- [ ] Call `babysitter session:associate` to bind the run to the session
- [ ] Verify the session state file now has a non-empty `run_id`
#### Pseudocode
```
function createAndBindRun(processId, entryPoint, inputs, prompt, sessionId, pluginRoot):
// Step 1: Create the run
createResult = shell(
"babysitter run:create" +
" --process-id {processId}" +
" --entry {entryPoint}" +
" --inputs {inputsFilePath}" +
" --prompt \"{prompt}\"" +
" --json"
)
runId = parseJson(createResult.stdout).runId
runDir = ".a5c/runs/{runId}"
// Step 2: Bind session to run
shell(
"babysitter session:associate" +
" --session-id {sessionId}" +
" --run-id {runId}" +
" --state-dir {pluginRoot}/skills/babysit/state" +
" --json"
)
return { runId, runDir }
```
#### Re-entrant Run Prevention
If the session is already bound to a different run, `session:associate` will fail.
The harness must either:
1. Complete or clean up the existing run first
2. Remove the old session state file manually
3. Present an error to the user
---
### 2d. The Orchestration Loop Driver
**Goal:** Convert the agent's single-turn execution into a multi-iteration
orchestration loop by intercepting exit signals, checking run status, and re-injecting
context.
This is the most critical and complex integration point.
#### Architecture
```
Agent executes turn
|
v
Agent signals "done" (stop/exit)
|
v
+--[EXIT INTERCEPTOR]----------------------------------------------+
| 1. Read session state file |
| 2. Check guards (max iterations, runaway detect, no run bound) |
| 3. Load run status via run:status |
| 4. Check completion proof |
| 5. Decision: APPROVE or BLOCK |
+--------+-------------------+-------------------------------------+
| |
[APPROVE] [BLOCK]
| |
v v
Session ends Re-inject context ------> Agent continues
(iteration message) (back to top)
```
#### The Decision Algorithm
```
function onAgentStop(sessionId, pluginRoot, runsDir, lastAgentOutput):
stateDir = "{pluginRoot}/skills/babysit/state"
stateFile = "{stateDir}/{sessionId}.md"
// --- Guard 1: No state file means no active loop ---
if not fileExists(stateFile):
return APPROVE
state = parseSessionState(stateFile)
// --- Guard 2: Max iterations ---
if state.iteration >= state.maxIterations:
cleanupSessionFile(stateFile)
return APPROVE
// --- Guard 3: Runaway loop detection ---
if state.iteration >= 5:
avgTime = average(state.iterationTimes) // last 3 durations
if avgTime <= 15: // seconds
cleanupSessionFile(stateFile)
return APPROVE
// --- Guard 4: No run bound ---
if state.runId == "":
cleanupSessionFile(stateFile)
return APPROVE
// --- Check run status ---
statusResult = shell(
"babysitter run:status .a5c/runs/{state.runId} --json"
)
runStatus = parseJson(statusResult.stdout)
// --- Guard 5: Unknown or unreadable run ---
if statusResult.exitCode != 0:
cleanupSessionFile(stateFile)
return APPROVE
// --- Guard 6: Completion proof ---
if runStatus.state == "completed":
proof = runStatus.completionProof
promiseTag = extractPromiseTag(lastAgentOutput)
if promiseTag == proof:
cleanupSessionFile(stateFile)
return APPROVE
// --- BLOCK: Continue the loop ---
// Advance session state from BOUND/ACTIVE to next iteration.
// See Section 4 (Session State Contract) for field update rules
// and the atomic write protocol.
newIteration = state.iteration + 1
updateSessionState(stateFile, {
iteration: newIteration,
lastIterationAt: now()
})
// Build the context message to re-inject
// NOTE: session:iteration-message uses --iteration, --run-id,
// --runs-dir, and --plugin-root (NOT --session-id or --state-dir)
iterationMessage = shell(
"babysitter session:iteration-message" +
" --iteration {newIteration}" +
" --run-id {state.runId}" +
" --runs-dir {runsDir}" +
" --plugin-root {pluginRoot}" +
" --json"
)
return BLOCK {
reason: parseJson(iterationMessage.stdout).systemMessage,
systemMessage: "Babysitter iteration {newIteration}/{state.maxIterations}"
}
```
#### Intercepting Exit Signals
The mechanism depends entirely on your harness. Common patterns:
| Harness Type | Interception Mechanism |
|-------------|------------------------|
| Hook-based (Claude Code, etc.) | Register a `Stop` hook that receives agent output and returns block/approve |
| Middleware-based | Wrap the agent loop's exit check in a middleware that calls the decision algorithm |
| Event-based | Listen for "agent_turn_complete" events, cancel and re-queue if BLOCK |
| Loop-based | Replace the `while (running)` loop condition with the decision algorithm |
| API-based | Between API calls, run the check and decide whether to make another call |
#### Re-injecting Context
After blocking, the harness must feed the orchestration context back to the agent.
The mechanism depends on your harness:
| Harness Type | Re-injection Mechanism |
|-------------|------------------------|
| System message injection | Append the `reason` as a system message before the next turn |
| User message simulation | Insert a synthetic user message containing the iteration context |
| Tool result injection | Return the context as a tool call result |
| Context window prepend | Prepend the context to the agent's next input |
The content to inject comes from the `systemMessage` field of the
`session:iteration-message` output. It typically contains:
1. Iteration number and status
2. What to do next (run:iterate, execute effects, extract proof, etc.)
3. Pending effect kinds if the run is in "waiting" state
#### Detecting the Completion Proof
The completion proof is a SHA-256 hash that the agent must output inside
`<promise>...</promise>` tags. The harness must:
1. Scan the agent's last output for `<promise>VALUE</promise>`
2. Compare VALUE against the `completionProof` from `run:status --json`
3. If they match, allow exit
```
function extractPromiseTag(text):
match = regex_search(text, "<promise>([\\s\\S]*?)</promise>")
if match is null:
return null
return trim(match.group(1)).replace(/\\s+/, " ")
```
If the harness cannot access the agent's output text (no transcript), alternative
approaches:
- Have the agent call a special "complete" tool that the harness intercepts
- Use a dedicated CLI command that the agent calls to signal completion
- Implement a "completion callback" webhook
---
### 2e. Effect Execution
**Goal:** Execute the pending tasks that the babysitter run has requested, then post
their results.
#### The Effect Execution Cycle
```
babysitter run:iterate .a5c/runs/{runId} --json
|
v
Returns: { status, pendingActions[], ... }
|
v
babysitter task:list .a5c/runs/{runId} --pending --json
|
v
Returns: { tasks: [{ effectId, taskId, kind, status, label, ... }] }
|
v
For each pending task:
|
+--[kind = "node"]----------> Execute Node.js script
|
+--[kind = "breakpoint"]----> Present to user for approval
|
+--[kind = "sleep"]---------> Wait until specified time
|
+--[kind = "orchestrator_ -> Delegate to a sub-agent or
| task"] orchestrator within your harness
|
+--[kind = "agent"]---------> Delegate to an agent subprocess
|
+--[custom kind]------------> Handle per your harness capabilities
|
v
Post result via task:post (Section 2f)
```
#### Effect Result Type
All effect handlers must return a result conforming to this structure (or the
sentinel `DEFERRED` for effects that will be resolved later):
```
EffectResult = {
status: "ok" | "error",
value: object // Payload specific to the effect kind
}
// For node tasks:
// { status: "ok", value: <return value of the Node.js function> }
// { status: "error", value: { message: string, stack?: string } }
// For breakpoints:
// { status: "ok", value: { approved: boolean, approvedBy?: string, reason?: string } }
// For sleep:
// { status: "ok", value: { wokeAt: string (ISO 8601), reason: string } }
// For orchestrator_task:
// { status: "ok", value: { output: any, completedAt: string } }
// { status: "error", value: { message: string, phase?: string } }
// For agent:
// { status: "ok", value: { response: string, tokensUsed?: number } }
// { status: "error", value: { message: string, exitCode?: number } }
```
#### Pseudocode
```
function executeEffects(runId):
runDir = ".a5c/runs/{runId}"
// Step 1: Iterate to discover pending effects
iterResult = shell("babysitter run:iterate {runDir} --json")
if iterResult.exitCode != 0:
handleCLIError("run:iterate", iterResult)
return
iterData = parseJson(iterResult.stdout)
if iterData.status == "completed":
// Run is done -- extract proof and output it
proof = iterData.completionProof
agentOutput("<promise>{proof}</promise>")
return
if iterData.status == "failed":
// Inspect error, attempt recovery
return
// Step 2: List pending tasks
listResult = shell("babysitter task:list {runDir} --pending --json")
if listResult.exitCode != 0:
handleCLIError("task:list", listResult)
return
tasks = parseJson(listResult.stdout).tasks
// Step 3: Execute each task
for task in tasks:
taskDir = "{runDir}/tasks/{task.effectId}"
taskDef = readJson("{taskDir}/task.json")
switch task.kind:
case "node":
result = executeNodeTask(taskDef)
case "breakpoint":
result = handleBreakpoint(taskDef)
case "sleep":
result = handleSleep(taskDef)
case "orchestrator_task":
result = handleOrchestratorTask(taskDef)
case "agent":
result = handleAgentTask(taskDef)
default:
result = handleCustomKind(task.kind, taskDef)
// Step 4: Post result (skip deferred effects like long sleeps)
if result != DEFERRED:
postResult(runId, task.effectId, result)
```
#### Breakpoint Effect Handler
Breakpoints are human approval gates. The process pauses until a human explicitly
approves or rejects the breakpoint. **Never auto-approve breakpoints** -- they exist
specifically to require human judgment.
```
function handleBreakpoint(taskDef):
// taskDef.args schema:
// {
// message?: string, // Human-readable description of what needs approval
// description?: string, // Alternative to message (checked as fallback)
// context?: {
// changedFiles?: string[], // Files modified since last breakpoint
// summary?: string, // Summary of work done so far
// risks?: string[], // Identified risks requiring human review
// [key: string]: unknown // Additional context from the process
// },
// requireExplicitApproval?: boolean, // If true, never auto-approve (default: true)
// blocking?: boolean // If true, the run cannot proceed without resolution (default: true)
// }
message = taskDef.args.message or taskDef.args.description or "Approval required"
context = taskDef.args.context or {}
requireExplicit = taskDef.args.requireExplicitApproval != false // default true
// Present to user via your harness's interactive prompt mechanism
if harnessSupportsInteractivePrompt():
// Build a rich prompt with context if available
promptBody = message
if context.summary:
promptBody += "\n\nSummary: " + context.summary
if context.risks and length(context.risks) > 0:
promptBody += "\n\nRisks:\n" + join(context.risks, "\n- ")
if context.changedFiles and length(context.changedFiles) > 0:
promptBody += "\n\nChanged files:\n" + join(context.changedFiles, "\n- ")
userDecision = promptUser(
title: "Babysitter Breakpoint",
message: promptBody,
options: ["approve", "reject"]
)
if userDecision == "approve":
return { status: "ok", value: { approved: true, approvedBy: "user" } }
else:
return { status: "ok", value: { approved: false, reason: "User rejected" } }
// Non-interactive fallback: reject with explanation
// The agent will see this and can inform the user
return {
status: "ok",
value: {
approved: false,
reason: "Non-interactive environment; breakpoint requires manual approval"
}
}
```
#### Sleep Effect Handler
Sleep effects pause execution until a specified time. The harness must decide whether
to block (wait inline) or defer (post result later).
```
function handleSleep(taskDef):
// taskDef.args schema:
// {
// until?: string, // ISO 8601 timestamp to sleep until
// sleepUntil?: string, // Alias for 'until'
// durationMs?: number, // Duration in milliseconds (alternative to until)
// reason?: string // Human-readable reason for the sleep
// }
//
// Exactly one of (until | sleepUntil) or durationMs should be provided.
// If both are present, the absolute timestamp (until/sleepUntil) takes precedence.
sleepUntil = taskDef.args.until or taskDef.args.sleepUntil
durationMs = taskDef.args.durationMs
if sleepUntil:
targetTime = parseISO8601(sleepUntil)
else if durationMs:
targetTime = now() + durationMs
else:
// No target time specified; resolve immediately
return { status: "ok", value: { wokeAt: now(), reason: "no_target_time" } }
remainingMs = targetTime - now()
if remainingMs <= 0:
// Sleep time already passed
return { status: "ok", value: { wokeAt: now(), reason: "already_elapsed" } }
if remainingMs <= 60000: // 1 minute or less
// Short sleep: block inline
sleep(remainingMs)
return { status: "ok", value: { wokeAt: now(), reason: "waited" } }
// Long sleep: post a deferred result
// Option A: Schedule a timer/cron to post the result later
scheduleDelayedPost(runId, effectId, targetTime)
// Do NOT post result now -- let the orchestration loop handle it
// on the next iteration after the timer fires
return DEFERRED // signal to caller: do not post result yet
```
#### Orchestrator Task Effect Handler
Orchestrator tasks delegate a sub-process to an orchestrator or sub-agent within
your harness. The task definition contains a prompt, optional inputs, and
configuration for the sub-process.
```
function handleOrchestratorTask(taskDef):
// taskDef.args schema:
// {
// prompt: string, // The instruction for the sub-agent
// processId?: string, // Optional sub-process ID to invoke
// inputs?: object, // Inputs to pass to the sub-process
// constraints?: {
// maxIterations?: number, // Iteration limit for the sub-process
// timeout?: number // Timeout in ms for the sub-process
// }
// }
prompt = taskDef.args.prompt
inputs = taskDef.args.inputs or {}
constraints = taskDef.args.constraints or {}
timeout = constraints.timeout or 900000 // default 15 min
if harnessSupportsSubAgentDelegation():
// Delegate to a sub-agent or internal orchestrator
subResult = delegateToSubAgent({
prompt: prompt,
inputs: inputs,
maxIterations: constraints.maxIterations or 50,
timeout: timeout
})
if subResult.success:
return {
status: "ok",
value: {
output: subResult.output,
completedAt: now()
}
}
else:
return {
status: "error",
value: {
message: subResult.error,
phase: subResult.failedPhase or "execution"
}
}
// Fallback: execute as a simple prompt-response if no sub-agent support
// This is a degraded mode -- the harness loses multi-step orchestration
response = executePromptSingleTurn(prompt, inputs)
return {
status: "ok",
value: {
output: response,
completedAt: now()
}
}
```
#### Agent Effect Handler
Agent effects delegate work to a standalone agent subprocess. Unlike
orchestrator_task, the agent effect expects a self-contained agent invocation
(typically a CLI tool or API call) that runs to completion.
```
function handleAgentTask(taskDef):
// taskDef.args schema:
// {
// command: string, // Agent command or prompt
// workingDir?: string, // Working directory for the agent
// env?: Record<string, string>, // Environment variables
// timeout?: number, // Timeout in ms (default: 900000)
// captureOutput?: boolean // Whether to capture stdout/stderr (default: true)
// }
command = taskDef.args.command
workingDir = taskDef.args.workingDir or getCwd()
env = taskDef.args.env or {}
timeout = taskDef.args.timeout or 900000 // default 15 min
if harnessSupportsAgentSubprocess():
// Spawn agent subprocess
agentResult = spawnAgent({
command: command,
workingDir: workingDir,
env: env,
timeout: timeout
})
if agentResult.exitCode == 0:
return {
status: "ok",
value: {
response: agentResult.stdout,
tokensUsed: agentResult.tokensUsed or null
}
}
else:
return {
status: "error",
value: {
message: agentResult.stderr or "Agent exited with code {agentResult.exitCode}",
exitCode: agentResult.exitCode
}
}
// Fallback: treat as a shell command
shellResult = shell(command, { cwd: workingDir, env: env, timeout: timeout })
if shellResult.exitCode == 0:
return { status: "ok", value: { response: shellResult.stdout } }
else:
return {
status: "error",
value: { message: shellResult.stderr, exitCode: shellResult.exitCode }
}
```
#### Reading Task Definitions
Each pending task has a `task.json` in its effect directory:
```
.a5c/runs/{runId}/tasks/{effectId}/task.json
```
The task definition contains the task kind, arguments, labels, and other metadata
needed to execute it. Read it with:
```bash
babysitter task:show .a5c/runs/{runId} {effectId} --json
```
---
### 2f. Result Posting
**Goal:** Record effect execution results back into the run journal.
#### IMPORTANT
Always post results through the CLI. Never write `result.json` directly. The CLI
command handles:
1. Writing `result.json` with the correct schema version
2. Appending an `EFFECT_RESOLVED` event to the journal
3. Updating the state cache
#### Pseudocode
```
function postResult(runId, effectId, result):
runDir = ".a5c/runs/{runId}"
taskDir = "{runDir}/tasks/{effectId}"
// Write the result value to a temporary file
valueFile = "{taskDir}/output.json"
writeJson(valueFile, result.value)
// Post through the CLI
shell(
"babysitter task:post {runDir} {effectId}" +
" --status {result.status}" + // "ok" or "error"
" --value {valueFile}" +
" --json"
)
```
#### CLI Command
```bash
# Success case
babysitter task:post .a5c/runs/{runId} {effectId} \
--status ok \
--value tasks/{effectId}/output.json \
--json
# Error case
babysitter task:post .a5c/runs/{runId} {effectId} \
--status error \
--value tasks/{effectId}/error.json \
--json
```
#### Result Status Values
| Status | Meaning |
|--------|---------|
| `ok` | Task completed successfully; value contains the result |
| `error` | Task failed; value contains error details |
---
### 2g. Iteration Guards
**Goal:** Prevent infinite loops and detect runaway behavior.
#### CLI Command
```bash
babysitter session:check-iteration \
--session-id {sessionId} \
--state-dir {stateDir} \
--json
```
#### Output
See [Section 6: CLI Output Schemas](#session-check-iteration-output) for the full
schema. Summary:
- `shouldContinue: true` -- safe to proceed; `nextIteration` indicates the next number
- `shouldContinue: false` -- stop the loop; `reason` explains why (e.g.,
`max_iterations_reached`, `session_not_found`)
#### Guard Logic
```
function checkIterationGuards(sessionId, stateDir):
result = shell(
"babysitter session:check-iteration" +
" --session-id {sessionId}" +
" --state-dir {stateDir}" +
" --json"
)
data = parseJson(result.stdout)
if not data.found:
return { shouldContinue: false, reason: "no_session" }
if not data.shouldContinue:
return { shouldContinue: false, reason: data.reason }
return { shouldContinue: true, nextIteration: data.nextIteration }
```
#### Two Detection Mechanisms
**1. Max Iterations Guard**
```
IF iteration >= maxIterations (default 65000):
STOP -- allow exit, clean up state file
```
**2. Runaway Speed Guard**
```
IF iteration >= 5:
avgDuration = average(last 3 iteration durations)
IF avgDuration <= 15 seconds:
STOP -- iterations are too fast, likely a runaway loop
```
The iteration duration is measured as the wall-clock time between consecutive
stop-hook invocations. Durations below 15 seconds on average (after at least 5
iterations) indicate the agent is not doing meaningful work.
**Threshold justifications:**
- **Why iteration >= 5:** The first few iterations are often fast because the
agent is reading instructions, creating the run, and performing lightweight
setup. A minimum of 5 iterations avoids false positives during this bootstrap
phase while still catching runaways before significant resource waste. Empirical
testing across Claude Code sessions showed that legitimate fast iterations
(setup, binding, first iterate) are consistently complete within 3-4 cycles.
- **Why average <= 15 seconds:** A meaningful agent iteration -- one that reads
files, calls an LLM, writes code, or executes tests -- typically takes 30-120
seconds. The 15-second threshold provides a 2x safety margin below the minimum
expected productive iteration time. Iterations under 15 seconds typically
indicate the agent is stuck in a loop where it reads the iteration message,
does no substantive work, and immediately signals completion. The 3-iteration
rolling average (rather than a single iteration) smooths out one-off fast
iterations caused by cached replay or quick task:post calls.
- **Tuning:** Both thresholds can be adjusted for specific harness environments.
If your agent performs very lightweight iterations (e.g., posting pre-computed
results), lower the speed threshold. If your setup phase is longer, raise the
minimum iteration count. The `session:check-iteration` CLI command applies
these same thresholds internally.
---
## 3. Harness Capability Matrix
### Required vs Optional Capabilities
| Capability | Required | Purpose |
|---|---|---|
| Shell command execution | YES | All CLI interactions |
| Exit/stop interception | YES | Core loop driver |
| Context re-injection | YES | Continue agent after BLOCK |
| Session identity | YES | State file naming, run binding |
| File system read/write | YES | State files, task artifacts |
| Transcript access | NO * | Completion proof via `<promise>` tag |
| Lifecycle hooks | NO | Simplifies wiring; can be emulated |
| Persistent environment | NO | Convenience; can pass via files |
| Interactive user prompts | NO | Breakpoint handling (non-interactive mode is fallback) |
| Sub-agent delegation | NO | orchestrator_task / agent effects |
\* If transcript access is unavailable, an alternative completion signaling mechanism
must be implemented.
### Integration Tiers
#### Tier 1: Minimum Viable Integration
Supports basic orchestration with node tasks and completion detection.
- [ ] SDK installation
- [ ] Session initialization
- [ ] Run creation and binding
- [ ] Exit interception with BLOCK/APPROVE
- [ ] `run:iterate` calls
- [ ] `task:list --pending` to discover effects
- [ ] Node task execution
- [ ] `task:post` to record results
- [ ] Completion proof detection (via transcript or alternative)
- [ ] Max iteration guard
#### Tier 2: Robust Integration
Adds safety guards and breakpoint support.
- [ ] Everything in Tier 1
- [ ] Runaway loop detection (iteration speed guard)
- [ ] `session:check-iteration` calls
- [ ] Interactive breakpoint handling
- [ ] Sleep effect handling
- [ ] Journal event recording for debugging
#### Tier 3: Full Integration
Complete feature parity with the Claude Code reference implementation.
- [ ] Everything in Tier 2
- [ ] Native lifecycle hooks (on-run-start, on-task-complete, etc.)
- [ ] Hook discovery (per-repo, per-user, plugin directories)
- [ ] Orchestrator task delegation
- [ ] Agent task delegation
- [ ] Quality scoring via on-score hooks
- [ ] Skill discovery and injection
- [ ] Non-interactive breakpoint auto-resolution
---
## 4. Session State Contract
### File Format
Session state files use Markdown with YAML frontmatter. The frontmatter stores
machine-readable state. The Markdown body stores the user's original prompt.
**Path convention:** `{stateDir}/{sessionId}.md`
#### Example
```markdown
---
active: true
iteration: 3
max_iterations: 65000
run_id: "my-run-abc123"
started_at: "2026-03-02T10:00:00Z"
last_iteration_at: "2026-03-02T10:05:30Z"
iteration_times: 45,62,58
---
Build a REST API with authentication and rate limiting for the user service.
```
### Required Fields
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `active` | boolean | `true` | Whether the orchestration loop is active |
| `iteration` | number | `1` | Current iteration (1-based) |
| `max_iterations` | number | `65000` | Maximum iterations (0 = unlimited) |
| `run_id` | string | `""` | Bound run ID (empty before run:create) |
| `started_at` | string (ISO 8601) | now | Session start timestamp |
| `last_iteration_at` | string (ISO 8601) | now | Last iteration timestamp |
| `iteration_times` | string (CSV) | (empty) | Last 3 iteration durations in seconds |
### State Transitions
```
CREATE BIND ITERATE (x N) COMPLETE
(session:init) (session:associate) (stop hook BLOCK) (stop hook APPROVE)
| | | |
v v v v
+-----------+ +-----------+ +-----------+ +------------+
| BASELINE | | BOUND | | ACTIVE | | CLEANED |
| |---->| |--------->| |------>| UP |
| runId="" | | runId=X | | iter=N+1 | | file |
| iter=1 | | iter=1 | | times=[.] | | inactive |
+-----------+ +-----------+ +-----------+ +------------+
```
### Atomic Write Protocol
Session state files must be written atomically to prevent corruption from
concurrent reads during stop-hook evaluation:
```
1. Write content to temp file: {filePath}.tmp.{pid}
2. Atomic rename: rename(tempFile, targetFile)
3. On error: delete temp file
```
### Timing Calculation
```
function updateIterationTimes(existingTimes, lastIterationAt, currentTime):
durationSeconds = (currentTime - lastIterationAt) / 1000
if durationSeconds <= 0:
return existingTimes
newTimes = append(existingTimes, durationSeconds)
return lastN(newTimes, 3) // keep only last 3
```
---
## 5. Hook Equivalence Table
The babysitter SDK and harness integration involve two categories of hooks:
### SDK Hooks (13 `KnownHookType` values)
These are dispatched by the SDK runtime during orchestration. They are defined in
`packages/sdk/src/hooks/types.ts` and fired via `callHook(hookType, payload)`.
| SDK Hook | Tier | Description |
|---|---|---|
| `on-run-start` | 3 | Fires after `run:create` completes |
| `on-run-complete` | 3 | Fires when `run:iterate` returns status=completed |
| `on-run-fail` | 3 | Fires when `run:iterate` returns status=failed |
| `on-task-start` | 3 | Fires before executing each pending effect |
| `on-task-complete` | 3 | Fires after `task:post` completes |
| `on-step-dispatch` | 3 | Fires when `run:iterate` discovers a new effect |
| `on-iteration-start` | 2 | Fires before calling `run:iterate` |
| `on-iteration-end` | 2 | Fires after all effects for an iteration are posted |
| `on-breakpoint` | 2 | Fires when a breakpoint effect is pending; present to user for approval |
| `on-score` | 3 | Fires when a quality score is posted to the run |
| `pre-commit` | 3 | Fires before the agent creates a git commit |
| `pre-branch` | 3 | Fires before the agent creates a new git branch |
| `post-planning` | 3 | Fires after the planning phase produces output |
### Harness-Level Concepts (not SDK KnownHookType values)
These are integration points that your harness must implement. They are NOT SDK hook
types -- they are harness-specific lifecycle events that drive the orchestration loop.
| Harness Concept | Tier | Generic Equivalent |
|---|---|---|
| **session-start** | 1 | Session/conversation start callback. Call `session:init` to create the baseline state file. This maps to your harness's "on conversation begin" event. |
| **stop** (exit interceptor) | 1 | Exit/turn-end interceptor. Run the decision algorithm (Section 2d) to BLOCK or APPROVE the agent's exit attempt. This is the core loop driver. |
| **session-end** | 1 | Session cleanup. Delete the session state file when the conversation ends normally. |
### Hook Discovery Directories
If implementing Tier 3, hook scripts are searched in this priority order:
```
1. Per-repo: {REPO_ROOT}/.a5c/hooks/{hookType}/*.sh (highest)
2. Per-user: ~/.config/babysitter/hooks/{hookType}/*.sh (medium)
3. Plugin: {PLUGIN_ROOT}/hooks/{hookType}/*.sh (lowest)
```
Scripts within each directory are sorted alphabetically and executed sequentially.
---
## 6. CLI Output Schemas
This section documents the JSON output schemas for the most frequently used CLI
commands. All examples assume `--json` is passed.
### `run:status` Output
```json
{
"state": "waiting",
"lastEvent": {
"type": "EFFECT_REQUESTED",
"recordedAt": "2026-03-02T10:05:00Z",
"data": { "..." : "..." }
},
"pendingByKind": {
"node": 2,
"breakpoint": 1
},
"pendingEffectsSummary": {
"totalPending": 3,
"countsByKind": { "node": 2, "breakpoint": 1 },
"autoRunnableCount": 2
},
"needsMoreIterations": true,
"metadata": null,
"completionProof": null
}
```
| Field | Type | Description |
|---|---|---|
| `state` | `"created" \| "waiting" \| "completed" \| "failed"` | Derived run lifecycle state |
| `lastEvent` | object or null | The most recent journal event (serialized) |
| `pendingByKind` | `Record<string, number>` | Count of pending effects grouped by kind |
| `pendingEffectsSummary.totalPending` | number | Total pending effects |
| `pendingEffectsSummary.autoRunnableCount` | number | Effects that can be auto-executed (kind=node) |
| `needsMoreIterations` | boolean | True if state=waiting and autoRunnableCount > 0 |
| `completionProof` | string or null | SHA-256 proof hash (only when state=completed) |
### `session:check-iteration` Output
The output always includes `found`, `shouldContinue`, `iteration`, `maxIterations`,
`runId`, and `prompt`. When `shouldContinue` is false, `reason` and `stopMessage`
explain why.
```json
// shouldContinue: true
{ "found": true, "shouldContinue": true, "nextIteration": 4,
"updatedIterationTimes": [45, 62, 58], "iteration": 3,
"maxIterations": 65000, "runId": "my-run-abc123", "prompt": "Build the API..." }
// shouldContinue: false -- possible reason values:
// "max_iterations_reached" (+ stopMessage)
// "session_not_found" (found=false, all counters zero)
```
| `reason` value | Trigger condition | Extra fields |
|---|---|---|
| `max_iterations_reached` | iteration >= maxIterations | -- |
| `session_not_found` | State file does not exist | `found: false` |
### `task:list` Output
```json
{
"tasks": [
{
"effectId": "E001-abc",
"taskId": "greet",
"stepId": "S000001",
"status": "pending",
"kind": "node",
"label": "Greet user",
"labels": ["greeting"],
"taskDefRef": "tasks/E001-abc/task.json",
"inputsRef": null,
"resultRef": null,
"stdoutRef": null,
"stderrRef": null,
"requestedAt": "2026-03-02T10:01:00Z",
"resolvedAt": null
}
]
}
```
| Field | Type | Description |
|---|---|---|
| `effectId` | string | Unique effect identifier |
| `taskId` | string | Task type identifier (from `defineTask`) |
| `stepId` | string | Sequential step ID (e.g., `S000001`) |
| `status` | `"pending" \| "resolved" \| "unknown"` | Current effect status |
| `kind` | string | Task kind: `node`, `breakpoint`, `sleep`, `orchestrator_task`, or custom |
| `label` | string or null | Human-readable label |
| `taskDefRef` | string or null | Relative path to task.json |
| `resultRef` | string or null | Relative path to result.json (null if pending) |
### `session:iteration-message` Output
**Command signature:**
```bash
babysitter session:iteration-message \
--iteration <n> \
[--run-id <id>] \
[--runs-dir <dir>] \
[--plugin-root <dir>] \
--json
```
Note: This command does NOT accept `--session-id` or `--state-dir`. It operates on
run data directly via `--run-id` and `--runs-dir`.
```json
{
"systemMessage": "Babysitter iteration 3 | Waiting on: node. Check if pending effects are resolved, then call run:iterate.",
"runState": "waiting",
"completionProof": null,
"pendingKinds": "node",
"skillContext": null,
"iteration": 3
}
```
| Field | Type | Description |
|---|---|---|
| `systemMessage` | string | The formatted message to re-inject into the agent's context |
| `runState` | `"created" \| "waiting" \| "completed" \| "failed"` or null | Derived run state |
| `completionProof` | string or null | Proof hash if run is completed |
| `pendingKinds` | string or null | Comma-separated list of pending effect kinds |
| `skillContext` | string or null | Discovered skill context (when `--plugin-root` is provided) |
| `iteration` | number | The iteration number passed in |
---
## 7. CLI Error Handling
All CLI commands can fail. The harness must handle these failures gracefully
rather than crashing or silently ignoring them. This section provides a unified
error handling strategy.
### Error Categories
| Category | Symptom | Recovery Strategy |
|----------|---------|-------------------|
| **Timeout** | CLI command exceeds expected duration | Kill the process, log the timeout, retry once with a longer timeout. If the retry also times out, APPROVE exit and log a diagnostic warning. |
| **JSON parse error** | stdout is empty or contains non-JSON text (e.g., stack traces, warnings) | Check stderr for error details. Strip any non-JSON prefix from stdout (some environments prepend warnings). If still unparseable, treat as a command failure. |
| **Lock conflict** | `run:iterate` or `task:post` fails because another process holds `run.lock` | Retry after 250ms, up to 40 retries (matching the SDK's internal retry behavior). If all retries fail, log the conflict and APPROVE exit. |
| **Missing run directory** | `run:status` or `run:iterate` returns non-zero with ENOENT-style error | The run was deleted or never created. Mark the session inactive and approve/fail loudly according to harness policy. |
| **Permission error** | EACCES or EPERM on file operations | Check file ownership and permissions. This usually indicates a misconfigured `BABYSITTER_RUNS_DIR`. |
| **Non-zero exit, valid JSON** | CLI returns exit code != 0 but stdout contains valid JSON with an `error` field | Parse the JSON error object for structured diagnostics. The `error.code` field often contains a machine-readable error type. |
### Unified Error Handler
```
function handleCLIError(commandName, shellResult):
// Step 1: Try to parse structured error from stdout
if shellResult.stdout != "":
try:
parsed = parseJson(shellResult.stdout)
if parsed.error:
log("CLI error in {commandName}: {parsed.error.message} (code: {parsed.error.code})")
return { category: "structured", error: parsed.error }
catch parseError:
// stdout is not valid JSON -- fall through
pass
// Step 2: Check for known error patterns in stderr
stderr = shellResult.stderr or ""
if contains(stderr, "ENOENT") or contains(stderr, "no such file"):
return { category: "missing_path", message: stderr }
if contains(stderr, "run.lock") or contains(stderr, "EBUSY"):
return { category: "lock_conflict", message: stderr, retryable: true }
if contains(stderr, "EACCES") or contains(stderr, "EPERM"):
return { category: "permission", message: stderr }
if shellResult.timedOut:
return { category: "timeout", message: "Command {commandName} timed out after {shellResult.timeoutMs}ms" }
// Step 3: Generic failure
return {
category: "unknown",
exitCode: shellResult.exitCode,
message: stderr or "Command {commandName} failed with exit code {shellResult.exitCode}"
}
```
### Recommended Timeouts by Command
| Command | Recommended Timeout | Notes |
|---------|-------------------|-------|
| `version --json` | 5s | Should be near-instant |
| `session:init` | 5s | File creation only |
| `session:associate` | 5s | File update only |
| `run:create` | 10s | Creates directory structure and journal |
| `run:assign-process` | 10s | Updates run.json and appends journal event under lock |
| `run:iterate` | 120s | May execute process function; uses `BABYSITTER_TIMEOUT` env var |
| `run:status` | 10s | Reads journal and derives state |
| `task:list` | 10s | Reads task directories |
| `task:post` | 15s | Writes result + appends journal event |
| `session:check-iteration` | 5s | Reads and parses state file |
| `session:iteration-message` | 10s | Reads run state, discovers skills |
| `run:repair-journal` | 30s | Scans and repairs journal files |
---
## 8. Edge Cases
> Note: For CLI command failures (timeouts, parse errors, lock conflicts), see
> [Section 7: CLI Error Handling](#7-cli-error-handling).
### Stale Session State File
If the harness crashes or the agent is forcefully terminated, a session state file
may be left behind. On the next session start, `session:init` will fail with
`SESSION_EXISTS`. Handling:
```
function handleStaleSession(sessionId, stateDir):
stateFile = "{stateDir}/{sessionId}.md"
existing = parseSessionState(stateFile)
// If the run is completed or failed, clean up and re-init
if existing.runId != "":
statusResult = shell("babysitter run:status .a5c/runs/{existing.runId} --json")
if statusResult.exitCode == 0:
runStatus = parseJson(statusResult.stdout)
if runStatus.state in ["completed", "failed"]:
deleteFile(stateFile)
return shell("babysitter session:init --session-id {sessionId} --state-dir {stateDir} --json")
// Otherwise, offer to resume
return { action: "resume_or_cleanup", existingRunId: existing.runId }
```
### Run Directory Missing or Corrupted
If the run directory is deleted or journal files are corrupted, `run:status` and
`run:iterate` will return non-zero exit codes. The stop interceptor should APPROVE
exit in this case (Guard 5 in the decision algorithm).
### Concurrent Sessions on Same Run
The SDK uses file-based run locking (`run.lock` with PID). If two sessions try to
iterate the same run concurrently, one will fail to acquire the lock. The harness
should retry after a short delay (250ms, up to 40 retries) or report the conflict.
### Effect Posted but Journal Not Updated
If the harness crashes between writing `result.json` and the CLI appending the
`EFFECT_RESOLVED` journal event, the run may appear stuck. Use `run:repair-journal`
to detect and fix such inconsistencies:
```bash
babysitter run:repair-journal .a5c/runs/{runId} --json
```
### Zero Pending Tasks After Iterate
If `run:iterate` returns status=waiting but `task:list --pending` returns zero tasks,
this indicates all effects were resolved during the iterate call itself (e.g., via
replay). Simply call `run:iterate` again on the next iteration.
---
## 9. Testing the Integration
### Smoke Test Checklist
Run these tests in order. Each builds on the previous.
#### Test 1: CLI Availability
```bash
babysitter version --json
```
- [ ] Exit code is 0
- [ ] Output contains `"version"` field
#### Test 2: Session Initialization
```bash
babysitter session:init \
--session-id test-session-001 \
--state-dir /tmp/babysitter-test/state \
--json
```
- [ ] Exit code is 0
- [ ] State file created at `/tmp/babysitter-test/state/test-session-001.md`
- [ ] File contains YAML frontmatter with `active: true`, `iteration: 1`, `run_id: ""`
#### Test 3: Run Creation
```bash
# Create a minimal process file
cat > /tmp/babysitter-test/process.js << 'EOF'
exports.process = async function(inputs, ctx) {
const result = await ctx.task('greet', { name: inputs.name });
return { greeting: result };
};
EOF
# Create inputs file
echo '{"name": "World"}' > /tmp/babysitter-test/inputs.json
# Create the run
babysitter run:create \
--process-id test-process \
--entry /tmp/babysitter-test/process.js#process \
--inputs /tmp/babysitter-test/inputs.json \
--prompt "Test run" \
--json
```
- [ ] Exit code is 0
- [ ] Output contains `"runId"` field
- [ ] Directory `.a5c/runs/{runId}/` exists with `run.json` and `journal/`
#### Test 4: Session Binding
```bash
babysitter session:associate \
--session-id test-session-001 \
--run-id {runId} \
--state-dir /tmp/babysitter-test/state \
--json
```
- [ ] Exit code is 0
- [ ] State file now has `run_id: "{runId}"`
#### Test 5: Run Iterate
```bash
babysitter run:iterate .a5c/runs/{runId} --json
```
- [ ] Exit code is 0
- [ ] Output contains `"status"` field
#### Test 6: Task List
```bash
babysitter task:list .a5c/runs/{runId} --pending --json
```
- [ ] Exit code is 0
- [ ] Output contains `"tasks"` array
#### Test 7: Iteration Guard
```bash
babysitter session:check-iteration \
--session-id test-session-001 \
--state-dir /tmp/babysitter-test/state \
--json
```
- [ ] Exit code is 0
- [ ] Output contains `"shouldContinue": true`
#### Test 8: Iteration Message
```bash
babysitter session:iteration-message \
--iteration 2 \
--run-id {runId} \
--runs-dir .a5c/runs \
--json
```
- [ ] Exit code is 0
- [ ] Output contains `"systemMessage"` field
- [ ] Output contains `"iteration": 2`
### Common Failure Modes
| Symptom | Likely Cause | Fix |
|---------|-------------|-----|
| `babysitter: command not found` | SDK not installed or not on PATH | Re-run installation (Section 2a) |
| Stop hook always APPROVEs | No session state file, or `run_id` is empty | Check session:init and session:associate ran |
| Infinite loop (never exits) | Completion proof not detected | Check transcript scanning for `<promise>` tags |
| Exits after 1 iteration | Stop interceptor not wired correctly | Verify BLOCK decision re-injects context |
| `Session already associated` | Re-entrant run on same session | Clean up old state file or complete old run |
| Iterations very fast, exits early | Runaway detection triggers (avg <= 15s) | Agent is not doing meaningful work per iteration; check effect execution |
| State file corrupt | Non-atomic write or concurrent access | Use atomic write protocol (temp + rename) |
| `task:post` fails | Writing result.json directly instead of via CLI | Always use `babysitter task:post` command |
| Run stuck in "waiting" | Effects executed but results not posted | Check task:post calls after each effect |
### End-to-End Integration Test
The following pseudocode validates the complete loop:
```
function testEndToEnd():
sessionId = "e2e-test-" + randomId()
pluginRoot = "/tmp/babysitter-e2e"
stateDir = "{pluginRoot}/skills/babysit/state"
runsDir = ".a5c/runs"
// 1. Init
ensureBabysitterCLI()
shell("babysitter session:init --session-id {sessionId} --state-dir {stateDir} --json")
// 2. Create and bind
result = shell("babysitter run:create --process-id test --entry ./process.js#process --inputs inputs.json --json")
runId = parseJson(result.stdout).runId
shell("babysitter session:associate --session-id {sessionId} --run-id {runId} --state-dir {stateDir} --json")
// 3. Iterate
iterResult = shell("babysitter run:iterate .a5c/runs/{runId} --json")
assert parseJson(iterResult.stdout).status in ["executed", "waiting", "completed"]
// 4. Execute effects
listResult = shell("babysitter task:list .a5c/runs/{runId} --pending --json")
tasks = parseJson(listResult.stdout).tasks
for task in tasks:
// Execute task, write output
shell("babysitter task:post .a5c/runs/{runId} {task.effectId} --status ok --value output.json --json")
// 5. Check iteration guard
guardResult = shell("babysitter session:check-iteration --session-id {sessionId} --state-dir {stateDir} --json")
assert parseJson(guardResult.stdout).shouldContinue == true
// 6. Get iteration message (correct params: --iteration, --run-id, --runs-dir)
msgResult = shell("babysitter session:iteration-message --iteration 2 --run-id {runId} --runs-dir {runsDir} --json")
assert parseJson(msgResult.stdout).systemMessage is not null
// 7. Re-iterate until completed
iterResult = shell("babysitter run:iterate .a5c/runs/{runId} --json")
if parseJson(iterResult.stdout).status == "completed":
proof = parseJson(iterResult.stdout).completionProof
assert proof is not null and proof is not ""
print("END-TO-END TEST PASSED")
```
---
## 10. Reference Implementation
The canonical reference implementation is the Claude Code harness adapter, documented at:
**[docs/assimilation/harness/claude-code-integration.md](./claude-code-integration.md)**
Key files in the reference implementation:
| File | Role |
|------|------|
| `packages/sdk/src/harness/types.ts` | `HarnessAdapter` interface definition |
| `packages/sdk/src/harness/claudeCode.ts` | Claude Code adapter (stop hook, session-start, binding) |
| `packages/sdk/src/harness/nullAdapter.ts` | No-op fallback adapter (useful as a starting template) |
| `packages/sdk/src/harness/registry.ts` | Adapter auto-detection and lookup |
| `packages/sdk/src/session/` | Session state parsing, writing, and types |
| `artifacts/generated-plugins/claude-code/hooks/babysitter-proxied-stop.sh` | Generated Claude Code stop hook entry |
| `artifacts/generated-plugins/claude-code/hooks/babysitter-proxied-session-start.sh` | Generated Claude Code session-start hook entry |
### Writing a New Harness Adapter
To add first-class SDK support for your harness, implement the `HarnessAdapter`
interface:
```typescript
interface HarnessAdapter {
readonly name: string;
isActive(): boolean;
resolveSessionId(parsed: { sessionId?: string }): string | undefined;
resolveStateDir(args: { stateDir?: string; pluginRoot?: string }): string | undefined;
resolvePluginRoot(args: { pluginRoot?: string }): string | undefined;
bindSession(opts: SessionBindOptions): Promise<SessionBindResult>;
handleStopHook(args: HookHandlerArgs): Promise<number>;
handleSessionStartHook(args: HookHandlerArgs): Promise<number>;
findHookDispatcherPath(startCwd: string): string | null;
}
```
Register your adapter in `packages/sdk/src/harness/registry.ts` and it will be
auto-detected when its `isActive()` method returns `true`.
For harnesses that cannot modify the SDK source, the entire integration can be
built externally by calling the babysitter CLI commands documented in this guide.
### Full-Code Example: Minimal Node.js Harness
The following is a complete, runnable Node.js implementation (not pseudocode) of a
minimal harness that drives a single babysitter run to completion. It covers session
initialization, run creation, the orchestration loop, effect execution for node tasks,
and completion proof extraction.
```javascript
#!/usr/bin/env node
// minimal-harness.js -- A complete minimal babysitter harness implementation.
// Usage: node minimal-harness.js <process-file>#<export> <inputs.json> [prompt]
const { execSync } = require('child_process');
const { readFileSync, writeFileSync, mkdirSync, existsSync } = require('fs');
const { join } = require('path');
const crypto = require('crypto');
// --- Configuration ---
const RUNS_DIR = '.a5c/runs';
const STATE_DIR = join(process.cwd(), '.harness-state');
const MAX_ITERATIONS = 65000;
const RUNAWAY_THRESHOLD_ITERATIONS = 5;
const RUNAWAY_THRESHOLD_SECONDS = 15;
const CLI_TIMEOUT_MS = 120_000;
// --- Helpers ---
function cli(command, timeoutMs = CLI_TIMEOUT_MS) {
try {
const stdout = execSync(`babysitter ${command}`, {
encoding: 'utf8',
timeout: timeoutMs,
stdio: ['pipe', 'pipe', 'pipe'],
});
return { exitCode: 0, stdout, stderr: '' };
} catch (err) {
return {
exitCode: err.status ?? 1,
stdout: err.stdout ?? '',
stderr: err.stderr ?? '',
timedOut: err.killed === true,
};
}
}
function cliJson(command, timeoutMs) {
const result = cli(`${command} --json`, timeoutMs);
if (result.exitCode !== 0) {
console.error(`CLI error (${command}): ${result.stderr}`);
return null;
}
try {
return JSON.parse(result.stdout);
} catch {
console.error(`JSON parse error for ${command}: ${result.stdout.slice(0, 200)}`);
return null;
}
}
// --- Main ---
async function main() {
const [,, entryPoint, inputsFile, prompt = 'Run process'] = process.argv;
if (!entryPoint || !inputsFile) {
console.error('Usage: node minimal-harness.js <entry>#<export> <inputs.json> [prompt]');
process.exit(1);
}
const sessionId = `harness-${crypto.randomUUID().slice(0, 8)}`;
mkdirSync(STATE_DIR, { recursive: true });
// Step 1: Verify CLI
const version = cliJson('version');
if (!version) { console.error('babysitter CLI not available'); process.exit(1); }
console.log(`Using babysitter SDK v${version.sdkVersion || version.version}`);
// Step 2: Session init
const initResult = cliJson(
`session:init --session-id ${sessionId} --state-dir ${STATE_DIR}`
);
if (!initResult) { console.error('Session init failed'); process.exit(1); }
// Step 3: Create run
const processId = entryPoint.split('#')[0].replace(/[^a-zA-Z0-9-_]/g, '-');
const createResult = cliJson(
`run:create --process-id ${processId} --entry ${entryPoint}` +
` --inputs ${inputsFile} --prompt "${prompt.replace(/"/g, '\\"')}"`
);
if (!createResult) { console.error('Run creation failed'); process.exit(1); }
const { runId } = createResult;
const runDir = join(RUNS_DIR, runId);
console.log(`Created run: ${runId}`);
// Step 4: Bind session
cliJson(
`session:associate --session-id ${sessionId} --run-id ${runId} --state-dir ${STATE_DIR}`
);
// Step 5: Orchestration loop
const iterationTimes = [];
let iteration = 0;
while (iteration < MAX_ITERATIONS) {
iteration++;
const iterStart = Date.now();
console.log(`\n--- Iteration ${iteration} ---`);
// 5a: Iterate
const iterData = cliJson(`run:iterate ${runDir}`, CLI_TIMEOUT_MS);
if (!iterData) { console.error('run:iterate failed'); break; }
if (iterData.status === 'completed') {
console.log(`Run completed. Proof: ${iterData.completionProof}`);
break;
}
if (iterData.status === 'failed') {
console.error('Run failed:', JSON.stringify(iterData, null, 2));
break;
}
// 5b: List and execute pending tasks
const listData = cliJson(`task:list ${runDir} --pending`);
if (!listData || !listData.tasks || listData.tasks.length === 0) {
console.log('No pending tasks; re-iterating...');
continue;
}
for (const task of listData.tasks) {
const taskDir = join(runDir, 'tasks', task.effectId);
const taskDefPath = join(taskDir, 'task.json');
if (!existsSync(taskDefPath)) {
console.error(`task.json missing for ${task.effectId}`);
continue;
}
const taskDef = JSON.parse(readFileSync(taskDefPath, 'utf8'));
let result;
switch (task.kind) {
case 'node': {
// Execute the node task's script
try {
const mod = require(taskDef.args.scriptPath);
const fn = taskDef.args.exportName ? mod[taskDef.args.exportName] : mod.default || mod;
const output = await fn(taskDef.args.input);
result = { status: 'ok', value: output };
} catch (err) {
result = { status: 'error', value: { message: err.message, stack: err.stack } };
}
break;
}
case 'breakpoint':
// Minimal harness: auto-reject breakpoints (non-interactive)
result = { status: 'ok', value: { approved: false, reason: 'Non-interactive harness' } };
break;
case 'sleep': {
const until = taskDef.args.until || taskDef.args.sleepUntil;
const durationMs = taskDef.args.durationMs;
const target = until ? new Date(until).getTime() : (Date.now() + (durationMs || 0));
const remaining = target - Date.now();
if (remaining > 0 && remaining <= 60000) {
await new Promise(r => setTimeout(r, remaining));
}
result = { status: 'ok', value: { wokeAt: new Date().toISOString(), reason: 'waited' } };
break;
}
default:
result = { status: 'error', value: { message: `Unsupported task kind: ${task.kind}` } };
}
// Post result
const outputPath = join(taskDir, 'output.json');
writeFileSync(outputPath, JSON.stringify(result.value));
cli(`task:post ${runDir} ${task.effectId} --status ${result.status} --value ${outputPath} --json`);
}
// 5c: Runaway detection
const iterDuration = (Date.now() - iterStart) / 1000;
iterationTimes.push(iterDuration);
if (iteration >= RUNAWAY_THRESHOLD_ITERATIONS) {
const recent = iterationTimes.slice(-3);
const avg = recent.reduce((a, b) => a + b, 0) / recent.length;
if (avg <= RUNAWAY_THRESHOLD_SECONDS) {
console.error(`Runaway detected: avg ${avg.toFixed(1)}s <= ${RUNAWAY_THRESHOLD_SECONDS}s threshold`);
break;
}
}
}
console.log(`\nHarness finished after ${iteration} iterations.`);
}
main().catch(err => { console.error(err); process.exit(1); });
```
---
## Appendix: Complete CLI Command Reference
| Command | Purpose | Section |
|---------|---------|---------|
| `babysitter version --json` | Verify CLI installation | 2a |
| `babysitter session:init --session-id ID --state-dir DIR --json` | Create baseline session state | 2b |
| `babysitter run:create --process-id PID --entry FILE --inputs FILE --json` | Create a new run | 2c |
| `babysitter run:assign-process RUNDIR --entry FILE [--process-id PID] --json` | Assign process to bare run | 2c |
| `babysitter session:associate --session-id ID --run-id RID --state-dir DIR --json` | Bind session to run | 2c |
| `babysitter run:iterate RUNDIR --json` | Advance orchestration, discover effects | 2d, 2e |
| `babysitter run:status RUNDIR --json` | Read run status and completion proof | 2d |
| `babysitter session:iteration-message --iteration N [--run-id ID] [--runs-dir DIR] [--plugin-root DIR] --json` | Get context to re-inject after BLOCK | 2d |
| `babysitter task:list RUNDIR --pending --json` | List pending effects | 2e |
| `babysitter task:show RUNDIR EFFECTID --json` | Read task definition | 2e |
| `babysitter task:post RUNDIR EFFECTID --status STATUS --value FILE --json` | Post effect result | 2f |
| `babysitter session:check-iteration --session-id ID --state-dir DIR --json` | Check iteration guards | 2g |
| `babysitter run:repair-journal RUNDIR --json` | Repair inconsistent journal | 7 |
| `babysitter hook:run --hook-type TYPE --harness NAME --json` | Dispatch a lifecycle hook | 5 |
documents
[]
Outgoing edges
None.
Incoming edges
contains_page1
- page:docs-assimilation-harness·PageHarness Integration