Agentic AI Atlas

II.

Page JSON

page:docs-assimilation-harness-generic-harness-guide

Structured · live

Generic Harness Integration Guide for Babysitter SDK json

Inspect the normalized record payload exactly as the atlas UI reads it.

File · wiki/docs/assimilation/harness/generic-harness-guide.mdCluster · wiki

Record JSON

{
  "id": "page:docs-assimilation-harness-generic-harness-guide",
  "_kind": "Page",
  "_file": "wiki/docs/assimilation/harness/generic-harness-guide.md",
  "_cluster": "wiki",
  "attributes": {
    "nodeKind": "Page",
    "sourcePath": "docs/assimilation/harness/generic-harness-guide.md",
    "sourceKind": "repo-docs",
    "title": "Generic Harness Integration Guide for Babysitter SDK",
    "displayName": "Generic Harness Integration Guide for Babysitter SDK",
    "slug": "docs/assimilation/harness/generic-harness-guide",
    "articlePath": "wiki/docs/assimilation/harness/generic-harness-guide.md",
    "article": "\n# Generic Harness Integration Guide for Babysitter SDK\n\nA step-by-step implementation guide for integrating the babysitter SDK orchestration\nloop into any AI coding harness. This document is harness-agnostic and uses pseudocode\nthroughout. For the canonical reference implementation, see\n[Claude Code Integration](./claude-code-integration.md).\n\n---\n\n## Table of Contents\n\n1. [Prerequisites](#1-prerequisites)\n2. [Core Integration Points](#2-core-integration-points)\n   - [2a. SDK Installation](#2a-sdk-installation)\n   - [2b. Session Initialization](#2b-session-initialization)\n   - [2c. Run Creation and Session Binding](#2c-run-creation-and-session-binding)\n   - [2d. The Orchestration Loop Driver](#2d-the-orchestration-loop-driver)\n   - [2e. Effect Execution](#2e-effect-execution)\n   - [2f. Result Posting](#2f-result-posting)\n   - [2g. Iteration Guards](#2g-iteration-guards)\n3. [Harness Capability Matrix](#3-harness-capability-matrix)\n4. [Session State Contract](#4-session-state-contract)\n5. [Hook Equivalence Table](#5-hook-equivalence-table)\n6. [CLI Output Schemas](#6-cli-output-schemas)\n7. [CLI Error Handling](#7-cli-error-handling)\n8. [Edge Cases](#8-edge-cases)\n9. [Testing the Integration](#9-testing-the-integration)\n10. [Reference Implementation](#10-reference-implementation)\n\n---\n\n## 1. Prerequisites\n\nYour harness must provide (or be able to emulate) the following capabilities before\nyou begin integration. Each item is marked as REQUIRED or RECOMMENDED.\n\n### Checklist\n\n- [ ] **REQUIRED: Shell or script execution** -- The harness must be able to execute\n  shell commands (`bash`, `sh`, `cmd`) or invoke Node.js scripts. The babysitter CLI\n  is a Node.js binary invoked via shell. Every integration point depends on running\n  `babysitter <command>` and reading its JSON output from stdout.\n\n- [ ] **REQUIRED: Exit/stop interception** -- The harness must provide a mechanism to\n  intercept the AI agent's attempt to end its turn or exit the conversation. This is\n  the single most critical requirement. Without it, the orchestration loop cannot\n  function. Examples:\n  - A \"stop hook\" that fires before the agent's response is finalized\n  - A middleware layer that can reject an exit signal and re-inject context\n  - A wrapper around the agent loop that checks a condition before allowing termination\n\n- [ ] **REQUIRED: Context re-injection** -- After blocking an exit, the harness must\n  be able to inject new text (a system message, user message, or tool result) into the\n  agent's context so it continues working. The injected content comes from the\n  babysitter CLI output.\n\n- [ ] **REQUIRED: Session/conversation identity** -- The harness must provide a stable\n  identifier for the current session or conversation. This ID is used to:\n  - Name the session state file\n  - Associate the session with a babysitter run\n  - Track iteration count across stop-hook cycles\n\n- [ ] **RECOMMENDED: Lifecycle hooks** -- Pre-session, post-session, pre-turn,\n  post-turn hooks simplify integration. If unavailable, equivalent behavior can be\n  built by wrapping the agent's main loop.\n\n- [ ] **RECOMMENDED: Transcript access** -- Access to the agent's recent output text\n  enables completion proof verification (scanning for `<promise>` tags). If\n  unavailable, an alternative proof mechanism must be implemented (see Section 2d).\n\n- [ ] **RECOMMENDED: Persistent environment** -- Environment variables or a key-value\n  store that persists across hook invocations within the same session. Used to carry\n  the session ID and plugin root path.\n\n### Minimum Environment\n\n```\nNode.js >= 18\nnpm >= 8\nFile system access (read/write to working directory)\n```\n\n---\n\n## 2. Core Integration Points\n\nImplement these in order. Each section includes a checklist, pseudocode, and the\nspecific CLI commands involved.\n\n```\n+-----------------------------------------------------------------------+\n|                         YOUR HARNESS                                  |\n|                                                                       |\n|  +------------------+  +-----------------------+  +----------------+  |\n|  | Session Lifecycle |  | Exit/Stop Interceptor |  | Agent Loop     |  |\n|  | (start/end)      |  | (block or approve)    |  | (LLM turns)   |  |\n|  +--------+---------+  +-----------+-----------+  +-------+--------+  |\n|           |                        |                      |           |\n+-----------------------------------------------------------------------+\n            |                        |                      |\n            v                        v                      v\n   +-------------------+   +-------------------+   +------------------+\n   | babysitter CLI    |   | Session State     |   | Run Directory    |\n   | (npm package)     |   | {stateDir}/       |   | .a5c/runs/{id}/  |\n   |                   |   |   {sessionId}.md  |   |   journal/       |\n   | session:init      |   +-------------------+   |   tasks/         |\n   | run:create        |                           |   state/         |\n   | run:assign-process|                           +------------------+\n   | session:associate |\n   | run:iterate       |\n   | task:list         |\n   | task:post         |\n   | session:check-    |\n   |   iteration       |\n   | session:iteration-|\n   |   message         |\n   | run:status        |\n   +-------------------+\n```\n\n---\n\n### 2a. SDK Installation\n\n**Goal:** Ensure the `babysitter` CLI binary is available on PATH.\n\n#### Checklist\n\n- [ ] Determine the SDK version to install (from plugin manifest or pinned version)\n- [ ] Attempt global install; fall back to local prefix install; fall back to npx\n- [ ] Gate installation behind a marker file to avoid repeated install attempts\n- [ ] Verify the CLI is callable: `babysitter version --json`\n\n#### Pseudocode\n\n```\nfunction ensureBabysitterCLI(sdkVersion):\n    markerFile = \"{pluginRoot}/.babysitter-install-attempted\"\n\n    if commandExists(\"babysitter\"):\n        return \"babysitter\"\n\n    if fileExists(markerFile):\n        // Already tried installing; fall through to npx\n    else:\n        // Attempt global install\n        result = shell(\"npm install -g @a5c-ai/babysitter-sdk@{sdkVersion}\")\n        if result.exitCode != 0:\n            // Fallback: install with local prefix\n            result = shell(\"npm install -g @a5c-ai/babysitter-sdk@{sdkVersion} --prefix $HOME/.local\")\n        writeFile(markerFile, \"attempted\")\n\n    if commandExists(\"babysitter\"):\n        return \"babysitter\"\n\n    // Final fallback: use npx on every invocation\n    return \"npx -y @a5c-ai/babysitter-sdk@{sdkVersion} babysitter\"\n```\n\n#### CLI Command\n\n```bash\n# Verify installation\nbabysitter version --json\n# Expected: { \"version\": \"x.y.z\", \"sdkVersion\": \"...\" }\n```\n\n---\n\n### 2b. Session Initialization\n\n**Goal:** Create a baseline session state file so the orchestration loop can track\niterations from the very start of the session, even before any run is created.\n\n#### When to Call\n\nAt session/conversation start -- before the user has issued any commands. This is\ntypically wired into a \"session start\" lifecycle hook or called at the top of the\nagent's main loop.\n\n#### Checklist\n\n- [ ] Obtain or generate a unique session ID\n- [ ] Determine the state directory (typically `{pluginRoot}/skills/babysit/state/`)\n- [ ] Call `babysitter session:init`\n- [ ] Persist the session ID in the harness environment for later hook invocations\n\n#### Pseudocode\n\n```\nfunction onSessionStart(sessionId, pluginRoot):\n    stateDir = \"{pluginRoot}/skills/babysit/state\"\n    ensureDirectoryExists(stateDir)\n\n    result = shell(\n        \"babysitter session:init\" +\n        \" --session-id {sessionId}\" +\n        \" --state-dir {stateDir}\" +\n        \" --json\"\n    )\n\n    if result.exitCode != 0:\n        log(\"WARNING: session init failed, orchestration may not work\")\n        return\n\n    // Persist session ID for use by the stop interceptor\n    setEnv(\"AGENT_SESSION_ID\", sessionId)\n    setEnv(\"BABYSITTER_PLUGIN_ROOT\", pluginRoot)\n```\n\n#### What This Creates\n\nA session state file at `{stateDir}/{sessionId}.md` in BASELINE state (empty\n`run_id`, `iteration: 1`). See [Section 4: Session State Contract](#4-session-state-contract)\nfor the full file format, field definitions, and state transition diagram.\n\n---\n\n### 2c. Run Creation and Session Binding\n\n**Goal:** Create a babysitter run and bind it to the current session so the stop\ninterceptor knows which run to check.\n\n#### When to Call\n\nAfter the user requests a task that should be orchestrated. Typically triggered by a\nskill or command within the harness (e.g., the user says \"babysit this task\").\n\n#### Checklist\n\n- [ ] Prepare the process definition (entry point, process ID, inputs)\n- [ ] Call `babysitter run:create` with harness and session parameters\n- [ ] Call `babysitter session:associate` to bind the run to the session\n- [ ] Verify the session state file now has a non-empty `run_id`\n\n#### Pseudocode\n\n```\nfunction createAndBindRun(processId, entryPoint, inputs, prompt, sessionId, pluginRoot):\n    // Step 1: Create the run\n    createResult = shell(\n        \"babysitter run:create\" +\n        \" --process-id {processId}\" +\n        \" --entry {entryPoint}\" +\n        \" --inputs {inputsFilePath}\" +\n        \" --prompt \\\"{prompt}\\\"\" +\n        \" --json\"\n    )\n    runId = parseJson(createResult.stdout).runId\n    runDir = \".a5c/runs/{runId}\"\n\n    // Step 2: Bind session to run\n    shell(\n        \"babysitter session:associate\" +\n        \" --session-id {sessionId}\" +\n        \" --run-id {runId}\" +\n        \" --state-dir {pluginRoot}/skills/babysit/state\" +\n        \" --json\"\n    )\n\n    return { runId, runDir }\n```\n\n#### Re-entrant Run Prevention\n\nIf the session is already bound to a different run, `session:associate` will fail.\nThe harness must either:\n1. Complete or clean up the existing run first\n2. Remove the old session state file manually\n3. Present an error to the user\n\n---\n\n### 2d. The Orchestration Loop Driver\n\n**Goal:** Convert the agent's single-turn execution into a multi-iteration\norchestration loop by intercepting exit signals, checking run status, and re-injecting\ncontext.\n\nThis is the most critical and complex integration point.\n\n#### Architecture\n\n```\nAgent executes turn\n     |\n     v\nAgent signals \"done\" (stop/exit)\n     |\n     v\n+--[EXIT INTERCEPTOR]----------------------------------------------+\n|  1. Read session state file                                      |\n|  2. Check guards (max iterations, runaway detect, no run bound)  |\n|  3. Load run status via run:status                               |\n|  4. Check completion proof                                       |\n|  5. Decision: APPROVE or BLOCK                                   |\n+--------+-------------------+-------------------------------------+\n         |                   |\n    [APPROVE]           [BLOCK]\n         |                   |\n         v                   v\n    Session ends     Re-inject context ------> Agent continues\n                     (iteration message)       (back to top)\n```\n\n#### The Decision Algorithm\n\n```\nfunction onAgentStop(sessionId, pluginRoot, runsDir, lastAgentOutput):\n    stateDir = \"{pluginRoot}/skills/babysit/state\"\n    stateFile = \"{stateDir}/{sessionId}.md\"\n\n    // --- Guard 1: No state file means no active loop ---\n    if not fileExists(stateFile):\n        return APPROVE\n\n    state = parseSessionState(stateFile)\n\n    // --- Guard 2: Max iterations ---\n    if state.iteration >= state.maxIterations:\n        cleanupSessionFile(stateFile)\n        return APPROVE\n\n    // --- Guard 3: Runaway loop detection ---\n    if state.iteration >= 5:\n        avgTime = average(state.iterationTimes)  // last 3 durations\n        if avgTime <= 15:  // seconds\n            cleanupSessionFile(stateFile)\n            return APPROVE\n\n    // --- Guard 4: No run bound ---\n    if state.runId == \"\":\n        cleanupSessionFile(stateFile)\n        return APPROVE\n\n    // --- Check run status ---\n    statusResult = shell(\n        \"babysitter run:status .a5c/runs/{state.runId} --json\"\n    )\n    runStatus = parseJson(statusResult.stdout)\n\n    // --- Guard 5: Unknown or unreadable run ---\n    if statusResult.exitCode != 0:\n        cleanupSessionFile(stateFile)\n        return APPROVE\n\n    // --- Guard 6: Completion proof ---\n    if runStatus.state == \"completed\":\n        proof = runStatus.completionProof\n        promiseTag = extractPromiseTag(lastAgentOutput)\n        if promiseTag == proof:\n            cleanupSessionFile(stateFile)\n            return APPROVE\n\n    // --- BLOCK: Continue the loop ---\n    // Advance session state from BOUND/ACTIVE to next iteration.\n    // See Section 4 (Session State Contract) for field update rules\n    // and the atomic write protocol.\n    newIteration = state.iteration + 1\n    updateSessionState(stateFile, {\n        iteration: newIteration,\n        lastIterationAt: now()\n    })\n\n    // Build the context message to re-inject\n    // NOTE: session:iteration-message uses --iteration, --run-id,\n    //       --runs-dir, and --plugin-root (NOT --session-id or --state-dir)\n    iterationMessage = shell(\n        \"babysitter session:iteration-message\" +\n        \" --iteration {newIteration}\" +\n        \" --run-id {state.runId}\" +\n        \" --runs-dir {runsDir}\" +\n        \" --plugin-root {pluginRoot}\" +\n        \" --json\"\n    )\n\n    return BLOCK {\n        reason: parseJson(iterationMessage.stdout).systemMessage,\n        systemMessage: \"Babysitter iteration {newIteration}/{state.maxIterations}\"\n    }\n```\n\n#### Intercepting Exit Signals\n\nThe mechanism depends entirely on your harness. Common patterns:\n\n| Harness Type | Interception Mechanism |\n|-------------|------------------------|\n| Hook-based (Claude Code, etc.) | Register a `Stop` hook that receives agent output and returns block/approve |\n| Middleware-based | Wrap the agent loop's exit check in a middleware that calls the decision algorithm |\n| Event-based | Listen for \"agent_turn_complete\" events, cancel and re-queue if BLOCK |\n| Loop-based | Replace the `while (running)` loop condition with the decision algorithm |\n| API-based | Between API calls, run the check and decide whether to make another call |\n\n#### Re-injecting Context\n\nAfter blocking, the harness must feed the orchestration context back to the agent.\nThe mechanism depends on your harness:\n\n| Harness Type | Re-injection Mechanism |\n|-------------|------------------------|\n| System message injection | Append the `reason` as a system message before the next turn |\n| User message simulation | Insert a synthetic user message containing the iteration context |\n| Tool result injection | Return the context as a tool call result |\n| Context window prepend | Prepend the context to the agent's next input |\n\nThe content to inject comes from the `systemMessage` field of the\n`session:iteration-message` output. It typically contains:\n1. Iteration number and status\n2. What to do next (run:iterate, execute effects, extract proof, etc.)\n3. Pending effect kinds if the run is in \"waiting\" state\n\n#### Detecting the Completion Proof\n\nThe completion proof is a SHA-256 hash that the agent must output inside\n`<promise>...</promise>` tags. The harness must:\n\n1. Scan the agent's last output for `<promise>VALUE</promise>`\n2. Compare VALUE against the `completionProof` from `run:status --json`\n3. If they match, allow exit\n\n```\nfunction extractPromiseTag(text):\n    match = regex_search(text, \"<promise>([\\\\s\\\\S]*?)</promise>\")\n    if match is null:\n        return null\n    return trim(match.group(1)).replace(/\\\\s+/, \" \")\n```\n\nIf the harness cannot access the agent's output text (no transcript), alternative\napproaches:\n- Have the agent call a special \"complete\" tool that the harness intercepts\n- Use a dedicated CLI command that the agent calls to signal completion\n- Implement a \"completion callback\" webhook\n\n---\n\n### 2e. Effect Execution\n\n**Goal:** Execute the pending tasks that the babysitter run has requested, then post\ntheir results.\n\n#### The Effect Execution Cycle\n\n```\nbabysitter run:iterate .a5c/runs/{runId} --json\n        |\n        v\n  Returns: { status, pendingActions[], ... }\n        |\n        v\nbabysitter task:list .a5c/runs/{runId} --pending --json\n        |\n        v\n  Returns: { tasks: [{ effectId, taskId, kind, status, label, ... }] }\n        |\n        v\n  For each pending task:\n        |\n        +--[kind = \"node\"]----------> Execute Node.js script\n        |\n        +--[kind = \"breakpoint\"]----> Present to user for approval\n        |\n        +--[kind = \"sleep\"]---------> Wait until specified time\n        |\n        +--[kind = \"orchestrator_  -> Delegate to a sub-agent or\n        |    task\"]                   orchestrator within your harness\n        |\n        +--[kind = \"agent\"]---------> Delegate to an agent subprocess\n        |\n        +--[custom kind]------------> Handle per your harness capabilities\n        |\n        v\n  Post result via task:post (Section 2f)\n```\n\n#### Effect Result Type\n\nAll effect handlers must return a result conforming to this structure (or the\nsentinel `DEFERRED` for effects that will be resolved later):\n\n```\nEffectResult = {\n    status: \"ok\" | \"error\",\n    value: object          // Payload specific to the effect kind\n}\n\n// For node tasks:\n//   { status: \"ok\", value: <return value of the Node.js function> }\n//   { status: \"error\", value: { message: string, stack?: string } }\n\n// For breakpoints:\n//   { status: \"ok\", value: { approved: boolean, approvedBy?: string, reason?: string } }\n\n// For sleep:\n//   { status: \"ok\", value: { wokeAt: string (ISO 8601), reason: string } }\n\n// For orchestrator_task:\n//   { status: \"ok\", value: { output: any, completedAt: string } }\n//   { status: \"error\", value: { message: string, phase?: string } }\n\n// For agent:\n//   { status: \"ok\", value: { response: string, tokensUsed?: number } }\n//   { status: \"error\", value: { message: string, exitCode?: number } }\n```\n\n#### Pseudocode\n\n```\nfunction executeEffects(runId):\n    runDir = \".a5c/runs/{runId}\"\n\n    // Step 1: Iterate to discover pending effects\n    iterResult = shell(\"babysitter run:iterate {runDir} --json\")\n    if iterResult.exitCode != 0:\n        handleCLIError(\"run:iterate\", iterResult)\n        return\n\n    iterData = parseJson(iterResult.stdout)\n\n    if iterData.status == \"completed\":\n        // Run is done -- extract proof and output it\n        proof = iterData.completionProof\n        agentOutput(\"<promise>{proof}</promise>\")\n        return\n\n    if iterData.status == \"failed\":\n        // Inspect error, attempt recovery\n        return\n\n    // Step 2: List pending tasks\n    listResult = shell(\"babysitter task:list {runDir} --pending --json\")\n    if listResult.exitCode != 0:\n        handleCLIError(\"task:list\", listResult)\n        return\n\n    tasks = parseJson(listResult.stdout).tasks\n\n    // Step 3: Execute each task\n    for task in tasks:\n        taskDir = \"{runDir}/tasks/{task.effectId}\"\n        taskDef = readJson(\"{taskDir}/task.json\")\n\n        switch task.kind:\n            case \"node\":\n                result = executeNodeTask(taskDef)\n            case \"breakpoint\":\n                result = handleBreakpoint(taskDef)\n            case \"sleep\":\n                result = handleSleep(taskDef)\n            case \"orchestrator_task\":\n                result = handleOrchestratorTask(taskDef)\n            case \"agent\":\n                result = handleAgentTask(taskDef)\n            default:\n                result = handleCustomKind(task.kind, taskDef)\n\n        // Step 4: Post result (skip deferred effects like long sleeps)\n        if result != DEFERRED:\n            postResult(runId, task.effectId, result)\n```\n\n#### Breakpoint Effect Handler\n\nBreakpoints are human approval gates. The process pauses until a human explicitly\napproves or rejects the breakpoint. **Never auto-approve breakpoints** -- they exist\nspecifically to require human judgment.\n\n```\nfunction handleBreakpoint(taskDef):\n    // taskDef.args schema:\n    //   {\n    //     message?: string,         // Human-readable description of what needs approval\n    //     description?: string,     // Alternative to message (checked as fallback)\n    //     context?: {\n    //       changedFiles?: string[],  // Files modified since last breakpoint\n    //       summary?: string,         // Summary of work done so far\n    //       risks?: string[],         // Identified risks requiring human review\n    //       [key: string]: unknown    // Additional context from the process\n    //     },\n    //     requireExplicitApproval?: boolean,  // If true, never auto-approve (default: true)\n    //     blocking?: boolean          // If true, the run cannot proceed without resolution (default: true)\n    //   }\n\n    message = taskDef.args.message or taskDef.args.description or \"Approval required\"\n    context = taskDef.args.context or {}\n    requireExplicit = taskDef.args.requireExplicitApproval != false  // default true\n\n    // Present to user via your harness's interactive prompt mechanism\n    if harnessSupportsInteractivePrompt():\n        // Build a rich prompt with context if available\n        promptBody = message\n        if context.summary:\n            promptBody += \"\\n\\nSummary: \" + context.summary\n        if context.risks and length(context.risks) > 0:\n            promptBody += \"\\n\\nRisks:\\n\" + join(context.risks, \"\\n- \")\n        if context.changedFiles and length(context.changedFiles) > 0:\n            promptBody += \"\\n\\nChanged files:\\n\" + join(context.changedFiles, \"\\n- \")\n\n        userDecision = promptUser(\n            title: \"Babysitter Breakpoint\",\n            message: promptBody,\n            options: [\"approve\", \"reject\"]\n        )\n\n        if userDecision == \"approve\":\n            return { status: \"ok\", value: { approved: true, approvedBy: \"user\" } }\n        else:\n            return { status: \"ok\", value: { approved: false, reason: \"User rejected\" } }\n\n    // Non-interactive fallback: reject with explanation\n    // The agent will see this and can inform the user\n    return {\n        status: \"ok\",\n        value: {\n            approved: false,\n            reason: \"Non-interactive environment; breakpoint requires manual approval\"\n        }\n    }\n```\n\n#### Sleep Effect Handler\n\nSleep effects pause execution until a specified time. The harness must decide whether\nto block (wait inline) or defer (post result later).\n\n```\nfunction handleSleep(taskDef):\n    // taskDef.args schema:\n    //   {\n    //     until?: string,          // ISO 8601 timestamp to sleep until\n    //     sleepUntil?: string,     // Alias for 'until'\n    //     durationMs?: number,     // Duration in milliseconds (alternative to until)\n    //     reason?: string          // Human-readable reason for the sleep\n    //   }\n    //\n    // Exactly one of (until | sleepUntil) or durationMs should be provided.\n    // If both are present, the absolute timestamp (until/sleepUntil) takes precedence.\n\n    sleepUntil = taskDef.args.until or taskDef.args.sleepUntil\n    durationMs = taskDef.args.durationMs\n\n    if sleepUntil:\n        targetTime = parseISO8601(sleepUntil)\n    else if durationMs:\n        targetTime = now() + durationMs\n    else:\n        // No target time specified; resolve immediately\n        return { status: \"ok\", value: { wokeAt: now(), reason: \"no_target_time\" } }\n\n    remainingMs = targetTime - now()\n\n    if remainingMs <= 0:\n        // Sleep time already passed\n        return { status: \"ok\", value: { wokeAt: now(), reason: \"already_elapsed\" } }\n\n    if remainingMs <= 60000:  // 1 minute or less\n        // Short sleep: block inline\n        sleep(remainingMs)\n        return { status: \"ok\", value: { wokeAt: now(), reason: \"waited\" } }\n\n    // Long sleep: post a deferred result\n    // Option A: Schedule a timer/cron to post the result later\n    scheduleDelayedPost(runId, effectId, targetTime)\n    // Do NOT post result now -- let the orchestration loop handle it\n    // on the next iteration after the timer fires\n    return DEFERRED  // signal to caller: do not post result yet\n```\n\n#### Orchestrator Task Effect Handler\n\nOrchestrator tasks delegate a sub-process to an orchestrator or sub-agent within\nyour harness. The task definition contains a prompt, optional inputs, and\nconfiguration for the sub-process.\n\n```\nfunction handleOrchestratorTask(taskDef):\n    // taskDef.args schema:\n    //   {\n    //     prompt: string,           // The instruction for the sub-agent\n    //     processId?: string,       // Optional sub-process ID to invoke\n    //     inputs?: object,          // Inputs to pass to the sub-process\n    //     constraints?: {\n    //       maxIterations?: number, // Iteration limit for the sub-process\n    //       timeout?: number        // Timeout in ms for the sub-process\n    //     }\n    //   }\n\n    prompt = taskDef.args.prompt\n    inputs = taskDef.args.inputs or {}\n    constraints = taskDef.args.constraints or {}\n    timeout = constraints.timeout or 900000  // default 15 min\n\n    if harnessSupportsSubAgentDelegation():\n        // Delegate to a sub-agent or internal orchestrator\n        subResult = delegateToSubAgent({\n            prompt: prompt,\n            inputs: inputs,\n            maxIterations: constraints.maxIterations or 50,\n            timeout: timeout\n        })\n\n        if subResult.success:\n            return {\n                status: \"ok\",\n                value: {\n                    output: subResult.output,\n                    completedAt: now()\n                }\n            }\n        else:\n            return {\n                status: \"error\",\n                value: {\n                    message: subResult.error,\n                    phase: subResult.failedPhase or \"execution\"\n                }\n            }\n\n    // Fallback: execute as a simple prompt-response if no sub-agent support\n    // This is a degraded mode -- the harness loses multi-step orchestration\n    response = executePromptSingleTurn(prompt, inputs)\n    return {\n        status: \"ok\",\n        value: {\n            output: response,\n            completedAt: now()\n        }\n    }\n```\n\n#### Agent Effect Handler\n\nAgent effects delegate work to a standalone agent subprocess. Unlike\norchestrator_task, the agent effect expects a self-contained agent invocation\n(typically a CLI tool or API call) that runs to completion.\n\n```\nfunction handleAgentTask(taskDef):\n    // taskDef.args schema:\n    //   {\n    //     command: string,          // Agent command or prompt\n    //     workingDir?: string,      // Working directory for the agent\n    //     env?: Record<string, string>,  // Environment variables\n    //     timeout?: number,         // Timeout in ms (default: 900000)\n    //     captureOutput?: boolean   // Whether to capture stdout/stderr (default: true)\n    //   }\n\n    command = taskDef.args.command\n    workingDir = taskDef.args.workingDir or getCwd()\n    env = taskDef.args.env or {}\n    timeout = taskDef.args.timeout or 900000  // default 15 min\n\n    if harnessSupportsAgentSubprocess():\n        // Spawn agent subprocess\n        agentResult = spawnAgent({\n            command: command,\n            workingDir: workingDir,\n            env: env,\n            timeout: timeout\n        })\n\n        if agentResult.exitCode == 0:\n            return {\n                status: \"ok\",\n                value: {\n                    response: agentResult.stdout,\n                    tokensUsed: agentResult.tokensUsed or null\n                }\n            }\n        else:\n            return {\n                status: \"error\",\n                value: {\n                    message: agentResult.stderr or \"Agent exited with code {agentResult.exitCode}\",\n                    exitCode: agentResult.exitCode\n                }\n            }\n\n    // Fallback: treat as a shell command\n    shellResult = shell(command, { cwd: workingDir, env: env, timeout: timeout })\n    if shellResult.exitCode == 0:\n        return { status: \"ok\", value: { response: shellResult.stdout } }\n    else:\n        return {\n            status: \"error\",\n            value: { message: shellResult.stderr, exitCode: shellResult.exitCode }\n        }\n```\n\n#### Reading Task Definitions\n\nEach pending task has a `task.json` in its effect directory:\n\n```\n.a5c/runs/{runId}/tasks/{effectId}/task.json\n```\n\nThe task definition contains the task kind, arguments, labels, and other metadata\nneeded to execute it. Read it with:\n\n```bash\nbabysitter task:show .a5c/runs/{runId} {effectId} --json\n```\n\n---\n\n### 2f. Result Posting\n\n**Goal:** Record effect execution results back into the run journal.\n\n#### IMPORTANT\n\nAlways post results through the CLI. Never write `result.json` directly. The CLI\ncommand handles:\n1. Writing `result.json` with the correct schema version\n2. Appending an `EFFECT_RESOLVED` event to the journal\n3. Updating the state cache\n\n#### Pseudocode\n\n```\nfunction postResult(runId, effectId, result):\n    runDir = \".a5c/runs/{runId}\"\n    taskDir = \"{runDir}/tasks/{effectId}\"\n\n    // Write the result value to a temporary file\n    valueFile = \"{taskDir}/output.json\"\n    writeJson(valueFile, result.value)\n\n    // Post through the CLI\n    shell(\n        \"babysitter task:post {runDir} {effectId}\" +\n        \" --status {result.status}\" +   // \"ok\" or \"error\"\n        \" --value {valueFile}\" +\n        \" --json\"\n    )\n```\n\n#### CLI Command\n\n```bash\n# Success case\nbabysitter task:post .a5c/runs/{runId} {effectId} \\\n  --status ok \\\n  --value tasks/{effectId}/output.json \\\n  --json\n\n# Error case\nbabysitter task:post .a5c/runs/{runId} {effectId} \\\n  --status error \\\n  --value tasks/{effectId}/error.json \\\n  --json\n```\n\n#### Result Status Values\n\n| Status | Meaning |\n|--------|---------|\n| `ok` | Task completed successfully; value contains the result |\n| `error` | Task failed; value contains error details |\n\n---\n\n### 2g. Iteration Guards\n\n**Goal:** Prevent infinite loops and detect runaway behavior.\n\n#### CLI Command\n\n```bash\nbabysitter session:check-iteration \\\n  --session-id {sessionId} \\\n  --state-dir {stateDir} \\\n  --json\n```\n\n#### Output\n\nSee [Section 6: CLI Output Schemas](#session-check-iteration-output) for the full\nschema. Summary:\n\n- `shouldContinue: true` -- safe to proceed; `nextIteration` indicates the next number\n- `shouldContinue: false` -- stop the loop; `reason` explains why (e.g.,\n  `max_iterations_reached`, `session_not_found`)\n\n#### Guard Logic\n\n```\nfunction checkIterationGuards(sessionId, stateDir):\n    result = shell(\n        \"babysitter session:check-iteration\" +\n        \" --session-id {sessionId}\" +\n        \" --state-dir {stateDir}\" +\n        \" --json\"\n    )\n    data = parseJson(result.stdout)\n\n    if not data.found:\n        return { shouldContinue: false, reason: \"no_session\" }\n\n    if not data.shouldContinue:\n        return { shouldContinue: false, reason: data.reason }\n\n    return { shouldContinue: true, nextIteration: data.nextIteration }\n```\n\n#### Two Detection Mechanisms\n\n**1. Max Iterations Guard**\n\n```\nIF iteration >= maxIterations (default 65000):\n    STOP -- allow exit, clean up state file\n```\n\n**2. Runaway Speed Guard**\n\n```\nIF iteration >= 5:\n    avgDuration = average(last 3 iteration durations)\n    IF avgDuration <= 15 seconds:\n        STOP -- iterations are too fast, likely a runaway loop\n```\n\nThe iteration duration is measured as the wall-clock time between consecutive\nstop-hook invocations. Durations below 15 seconds on average (after at least 5\niterations) indicate the agent is not doing meaningful work.\n\n**Threshold justifications:**\n\n- **Why iteration >= 5:** The first few iterations are often fast because the\n  agent is reading instructions, creating the run, and performing lightweight\n  setup. A minimum of 5 iterations avoids false positives during this bootstrap\n  phase while still catching runaways before significant resource waste. Empirical\n  testing across Claude Code sessions showed that legitimate fast iterations\n  (setup, binding, first iterate) are consistently complete within 3-4 cycles.\n\n- **Why average <= 15 seconds:** A meaningful agent iteration -- one that reads\n  files, calls an LLM, writes code, or executes tests -- typically takes 30-120\n  seconds. The 15-second threshold provides a 2x safety margin below the minimum\n  expected productive iteration time. Iterations under 15 seconds typically\n  indicate the agent is stuck in a loop where it reads the iteration message,\n  does no substantive work, and immediately signals completion. The 3-iteration\n  rolling average (rather than a single iteration) smooths out one-off fast\n  iterations caused by cached replay or quick task:post calls.\n\n- **Tuning:** Both thresholds can be adjusted for specific harness environments.\n  If your agent performs very lightweight iterations (e.g., posting pre-computed\n  results), lower the speed threshold. If your setup phase is longer, raise the\n  minimum iteration count. The `session:check-iteration` CLI command applies\n  these same thresholds internally.\n\n---\n\n## 3. Harness Capability Matrix\n\n### Required vs Optional Capabilities\n\n| Capability | Required | Purpose |\n|---|---|---|\n| Shell command execution | YES | All CLI interactions |\n| Exit/stop interception | YES | Core loop driver |\n| Context re-injection | YES | Continue agent after BLOCK |\n| Session identity | YES | State file naming, run binding |\n| File system read/write | YES | State files, task artifacts |\n| Transcript access | NO * | Completion proof via `<promise>` tag |\n| Lifecycle hooks | NO | Simplifies wiring; can be emulated |\n| Persistent environment | NO | Convenience; can pass via files |\n| Interactive user prompts | NO | Breakpoint handling (non-interactive mode is fallback) |\n| Sub-agent delegation | NO | orchestrator_task / agent effects |\n\n\\* If transcript access is unavailable, an alternative completion signaling mechanism\nmust be implemented.\n\n### Integration Tiers\n\n#### Tier 1: Minimum Viable Integration\n\nSupports basic orchestration with node tasks and completion detection.\n\n- [ ] SDK installation\n- [ ] Session initialization\n- [ ] Run creation and binding\n- [ ] Exit interception with BLOCK/APPROVE\n- [ ] `run:iterate` calls\n- [ ] `task:list --pending` to discover effects\n- [ ] Node task execution\n- [ ] `task:post` to record results\n- [ ] Completion proof detection (via transcript or alternative)\n- [ ] Max iteration guard\n\n#### Tier 2: Robust Integration\n\nAdds safety guards and breakpoint support.\n\n- [ ] Everything in Tier 1\n- [ ] Runaway loop detection (iteration speed guard)\n- [ ] `session:check-iteration` calls\n- [ ] Interactive breakpoint handling\n- [ ] Sleep effect handling\n- [ ] Journal event recording for debugging\n\n#### Tier 3: Full Integration\n\nComplete feature parity with the Claude Code reference implementation.\n\n- [ ] Everything in Tier 2\n- [ ] Native lifecycle hooks (on-run-start, on-task-complete, etc.)\n- [ ] Hook discovery (per-repo, per-user, plugin directories)\n- [ ] Orchestrator task delegation\n- [ ] Agent task delegation\n- [ ] Quality scoring via on-score hooks\n- [ ] Skill discovery and injection\n- [ ] Non-interactive breakpoint auto-resolution\n\n---\n\n## 4. Session State Contract\n\n### File Format\n\nSession state files use Markdown with YAML frontmatter. The frontmatter stores\nmachine-readable state. The Markdown body stores the user's original prompt.\n\n**Path convention:** `{stateDir}/{sessionId}.md`\n\n#### Example\n\n```markdown\n---\nactive: true\niteration: 3\nmax_iterations: 65000\nrun_id: \"my-run-abc123\"\nstarted_at: \"2026-03-02T10:00:00Z\"\nlast_iteration_at: \"2026-03-02T10:05:30Z\"\niteration_times: 45,62,58\n---\n\nBuild a REST API with authentication and rate limiting for the user service.\n```\n\n### Required Fields\n\n| Field | Type | Default | Description |\n|-------|------|---------|-------------|\n| `active` | boolean | `true` | Whether the orchestration loop is active |\n| `iteration` | number | `1` | Current iteration (1-based) |\n| `max_iterations` | number | `65000` | Maximum iterations (0 = unlimited) |\n| `run_id` | string | `\"\"` | Bound run ID (empty before run:create) |\n| `started_at` | string (ISO 8601) | now | Session start timestamp |\n| `last_iteration_at` | string (ISO 8601) | now | Last iteration timestamp |\n| `iteration_times` | string (CSV) | (empty) | Last 3 iteration durations in seconds |\n\n### State Transitions\n\n```\n    CREATE                BIND               ITERATE (x N)        COMPLETE\n  (session:init)    (session:associate)    (stop hook BLOCK)    (stop hook APPROVE)\n       |                   |                     |                    |\n       v                   v                     v                    v\n  +-----------+     +-----------+          +-----------+       +------------+\n  |  BASELINE |     |   BOUND   |          |  ACTIVE   |       |  CLEANED   |\n  |           |---->|           |--------->|           |------>|   UP       |\n  | runId=\"\"  |     | runId=X   |          | iter=N+1  |       | file       |\n  | iter=1    |     | iter=1    |          | times=[.] |       | inactive   |\n  +-----------+     +-----------+          +-----------+       +------------+\n```\n\n### Atomic Write Protocol\n\nSession state files must be written atomically to prevent corruption from\nconcurrent reads during stop-hook evaluation:\n\n```\n1. Write content to temp file: {filePath}.tmp.{pid}\n2. Atomic rename: rename(tempFile, targetFile)\n3. On error: delete temp file\n```\n\n### Timing Calculation\n\n```\nfunction updateIterationTimes(existingTimes, lastIterationAt, currentTime):\n    durationSeconds = (currentTime - lastIterationAt) / 1000\n    if durationSeconds <= 0:\n        return existingTimes\n    newTimes = append(existingTimes, durationSeconds)\n    return lastN(newTimes, 3)   // keep only last 3\n```\n\n---\n\n## 5. Hook Equivalence Table\n\nThe babysitter SDK and harness integration involve two categories of hooks:\n\n### SDK Hooks (13 `KnownHookType` values)\n\nThese are dispatched by the SDK runtime during orchestration. They are defined in\n`packages/sdk/src/hooks/types.ts` and fired via `callHook(hookType, payload)`.\n\n| SDK Hook | Tier | Description |\n|---|---|---|\n| `on-run-start` | 3 | Fires after `run:create` completes |\n| `on-run-complete` | 3 | Fires when `run:iterate` returns status=completed |\n| `on-run-fail` | 3 | Fires when `run:iterate` returns status=failed |\n| `on-task-start` | 3 | Fires before executing each pending effect |\n| `on-task-complete` | 3 | Fires after `task:post` completes |\n| `on-step-dispatch` | 3 | Fires when `run:iterate` discovers a new effect |\n| `on-iteration-start` | 2 | Fires before calling `run:iterate` |\n| `on-iteration-end` | 2 | Fires after all effects for an iteration are posted |\n| `on-breakpoint` | 2 | Fires when a breakpoint effect is pending; present to user for approval |\n| `on-score` | 3 | Fires when a quality score is posted to the run |\n| `pre-commit` | 3 | Fires before the agent creates a git commit |\n| `pre-branch` | 3 | Fires before the agent creates a new git branch |\n| `post-planning` | 3 | Fires after the planning phase produces output |\n\n### Harness-Level Concepts (not SDK KnownHookType values)\n\nThese are integration points that your harness must implement. They are NOT SDK hook\ntypes -- they are harness-specific lifecycle events that drive the orchestration loop.\n\n| Harness Concept | Tier | Generic Equivalent |\n|---|---|---|\n| **session-start** | 1 | Session/conversation start callback. Call `session:init` to create the baseline state file. This maps to your harness's \"on conversation begin\" event. |\n| **stop** (exit interceptor) | 1 | Exit/turn-end interceptor. Run the decision algorithm (Section 2d) to BLOCK or APPROVE the agent's exit attempt. This is the core loop driver. |\n| **session-end** | 1 | Session cleanup. Delete the session state file when the conversation ends normally. |\n\n### Hook Discovery Directories\n\nIf implementing Tier 3, hook scripts are searched in this priority order:\n\n```\n1. Per-repo:   {REPO_ROOT}/.a5c/hooks/{hookType}/*.sh     (highest)\n2. Per-user:   ~/.config/babysitter/hooks/{hookType}/*.sh  (medium)\n3. Plugin:     {PLUGIN_ROOT}/hooks/{hookType}/*.sh         (lowest)\n```\n\nScripts within each directory are sorted alphabetically and executed sequentially.\n\n---\n\n## 6. CLI Output Schemas\n\nThis section documents the JSON output schemas for the most frequently used CLI\ncommands. All examples assume `--json` is passed.\n\n### `run:status` Output\n\n```json\n{\n  \"state\": \"waiting\",\n  \"lastEvent\": {\n    \"type\": \"EFFECT_REQUESTED\",\n    \"recordedAt\": \"2026-03-02T10:05:00Z\",\n    \"data\": { \"...\" : \"...\" }\n  },\n  \"pendingByKind\": {\n    \"node\": 2,\n    \"breakpoint\": 1\n  },\n  \"pendingEffectsSummary\": {\n    \"totalPending\": 3,\n    \"countsByKind\": { \"node\": 2, \"breakpoint\": 1 },\n    \"autoRunnableCount\": 2\n  },\n  \"needsMoreIterations\": true,\n  \"metadata\": null,\n  \"completionProof\": null\n}\n```\n\n| Field | Type | Description |\n|---|---|---|\n| `state` | `\"created\" \\| \"waiting\" \\| \"completed\" \\| \"failed\"` | Derived run lifecycle state |\n| `lastEvent` | object or null | The most recent journal event (serialized) |\n| `pendingByKind` | `Record<string, number>` | Count of pending effects grouped by kind |\n| `pendingEffectsSummary.totalPending` | number | Total pending effects |\n| `pendingEffectsSummary.autoRunnableCount` | number | Effects that can be auto-executed (kind=node) |\n| `needsMoreIterations` | boolean | True if state=waiting and autoRunnableCount > 0 |\n| `completionProof` | string or null | SHA-256 proof hash (only when state=completed) |\n\n### `session:check-iteration` Output\n\nThe output always includes `found`, `shouldContinue`, `iteration`, `maxIterations`,\n`runId`, and `prompt`. When `shouldContinue` is false, `reason` and `stopMessage`\nexplain why.\n\n```json\n// shouldContinue: true\n{ \"found\": true, \"shouldContinue\": true, \"nextIteration\": 4,\n  \"updatedIterationTimes\": [45, 62, 58], \"iteration\": 3,\n  \"maxIterations\": 65000, \"runId\": \"my-run-abc123\", \"prompt\": \"Build the API...\" }\n\n// shouldContinue: false -- possible reason values:\n//   \"max_iterations_reached\" (+ stopMessage)\n//   \"session_not_found\"      (found=false, all counters zero)\n```\n\n| `reason` value | Trigger condition | Extra fields |\n|---|---|---|\n| `max_iterations_reached` | iteration >= maxIterations | -- |\n| `session_not_found` | State file does not exist | `found: false` |\n\n### `task:list` Output\n\n```json\n{\n  \"tasks\": [\n    {\n      \"effectId\": \"E001-abc\",\n      \"taskId\": \"greet\",\n      \"stepId\": \"S000001\",\n      \"status\": \"pending\",\n      \"kind\": \"node\",\n      \"label\": \"Greet user\",\n      \"labels\": [\"greeting\"],\n      \"taskDefRef\": \"tasks/E001-abc/task.json\",\n      \"inputsRef\": null,\n      \"resultRef\": null,\n      \"stdoutRef\": null,\n      \"stderrRef\": null,\n      \"requestedAt\": \"2026-03-02T10:01:00Z\",\n      \"resolvedAt\": null\n    }\n  ]\n}\n```\n\n| Field | Type | Description |\n|---|---|---|\n| `effectId` | string | Unique effect identifier |\n| `taskId` | string | Task type identifier (from `defineTask`) |\n| `stepId` | string | Sequential step ID (e.g., `S000001`) |\n| `status` | `\"pending\" \\| \"resolved\" \\| \"unknown\"` | Current effect status |\n| `kind` | string | Task kind: `node`, `breakpoint`, `sleep`, `orchestrator_task`, or custom |\n| `label` | string or null | Human-readable label |\n| `taskDefRef` | string or null | Relative path to task.json |\n| `resultRef` | string or null | Relative path to result.json (null if pending) |\n\n### `session:iteration-message` Output\n\n**Command signature:**\n\n```bash\nbabysitter session:iteration-message \\\n  --iteration <n> \\\n  [--run-id <id>] \\\n  [--runs-dir <dir>] \\\n  [--plugin-root <dir>] \\\n  --json\n```\n\nNote: This command does NOT accept `--session-id` or `--state-dir`. It operates on\nrun data directly via `--run-id` and `--runs-dir`.\n\n```json\n{\n  \"systemMessage\": \"Babysitter iteration 3 | Waiting on: node. Check if pending effects are resolved, then call run:iterate.\",\n  \"runState\": \"waiting\",\n  \"completionProof\": null,\n  \"pendingKinds\": \"node\",\n  \"skillContext\": null,\n  \"iteration\": 3\n}\n```\n\n| Field | Type | Description |\n|---|---|---|\n| `systemMessage` | string | The formatted message to re-inject into the agent's context |\n| `runState` | `\"created\" \\| \"waiting\" \\| \"completed\" \\| \"failed\"` or null | Derived run state |\n| `completionProof` | string or null | Proof hash if run is completed |\n| `pendingKinds` | string or null | Comma-separated list of pending effect kinds |\n| `skillContext` | string or null | Discovered skill context (when `--plugin-root` is provided) |\n| `iteration` | number | The iteration number passed in |\n\n---\n\n## 7. CLI Error Handling\n\nAll CLI commands can fail. The harness must handle these failures gracefully\nrather than crashing or silently ignoring them. This section provides a unified\nerror handling strategy.\n\n### Error Categories\n\n| Category | Symptom | Recovery Strategy |\n|----------|---------|-------------------|\n| **Timeout** | CLI command exceeds expected duration | Kill the process, log the timeout, retry once with a longer timeout. If the retry also times out, APPROVE exit and log a diagnostic warning. |\n| **JSON parse error** | stdout is empty or contains non-JSON text (e.g., stack traces, warnings) | Check stderr for error details. Strip any non-JSON prefix from stdout (some environments prepend warnings). If still unparseable, treat as a command failure. |\n| **Lock conflict** | `run:iterate` or `task:post` fails because another process holds `run.lock` | Retry after 250ms, up to 40 retries (matching the SDK's internal retry behavior). If all retries fail, log the conflict and APPROVE exit. |\n| **Missing run directory** | `run:status` or `run:iterate` returns non-zero with ENOENT-style error | The run was deleted or never created. Mark the session inactive and approve/fail loudly according to harness policy. |\n| **Permission error** | EACCES or EPERM on file operations | Check file ownership and permissions. This usually indicates a misconfigured `BABYSITTER_RUNS_DIR`. |\n| **Non-zero exit, valid JSON** | CLI returns exit code != 0 but stdout contains valid JSON with an `error` field | Parse the JSON error object for structured diagnostics. The `error.code` field often contains a machine-readable error type. |\n\n### Unified Error Handler\n\n```\nfunction handleCLIError(commandName, shellResult):\n    // Step 1: Try to parse structured error from stdout\n    if shellResult.stdout != \"\":\n        try:\n            parsed = parseJson(shellResult.stdout)\n            if parsed.error:\n                log(\"CLI error in {commandName}: {parsed.error.message} (code: {parsed.error.code})\")\n                return { category: \"structured\", error: parsed.error }\n        catch parseError:\n            // stdout is not valid JSON -- fall through\n            pass\n\n    // Step 2: Check for known error patterns in stderr\n    stderr = shellResult.stderr or \"\"\n\n    if contains(stderr, \"ENOENT\") or contains(stderr, \"no such file\"):\n        return { category: \"missing_path\", message: stderr }\n\n    if contains(stderr, \"run.lock\") or contains(stderr, \"EBUSY\"):\n        return { category: \"lock_conflict\", message: stderr, retryable: true }\n\n    if contains(stderr, \"EACCES\") or contains(stderr, \"EPERM\"):\n        return { category: \"permission\", message: stderr }\n\n    if shellResult.timedOut:\n        return { category: \"timeout\", message: \"Command {commandName} timed out after {shellResult.timeoutMs}ms\" }\n\n    // Step 3: Generic failure\n    return {\n        category: \"unknown\",\n        exitCode: shellResult.exitCode,\n        message: stderr or \"Command {commandName} failed with exit code {shellResult.exitCode}\"\n    }\n```\n\n### Recommended Timeouts by Command\n\n| Command | Recommended Timeout | Notes |\n|---------|-------------------|-------|\n| `version --json` | 5s | Should be near-instant |\n| `session:init` | 5s | File creation only |\n| `session:associate` | 5s | File update only |\n| `run:create` | 10s | Creates directory structure and journal |\n| `run:assign-process` | 10s | Updates run.json and appends journal event under lock |\n| `run:iterate` | 120s | May execute process function; uses `BABYSITTER_TIMEOUT` env var |\n| `run:status` | 10s | Reads journal and derives state |\n| `task:list` | 10s | Reads task directories |\n| `task:post` | 15s | Writes result + appends journal event |\n| `session:check-iteration` | 5s | Reads and parses state file |\n| `session:iteration-message` | 10s | Reads run state, discovers skills |\n| `run:repair-journal` | 30s | Scans and repairs journal files |\n\n---\n\n## 8. Edge Cases\n\n> Note: For CLI command failures (timeouts, parse errors, lock conflicts), see\n> [Section 7: CLI Error Handling](#7-cli-error-handling).\n\n### Stale Session State File\n\nIf the harness crashes or the agent is forcefully terminated, a session state file\nmay be left behind. On the next session start, `session:init` will fail with\n`SESSION_EXISTS`. Handling:\n\n```\nfunction handleStaleSession(sessionId, stateDir):\n    stateFile = \"{stateDir}/{sessionId}.md\"\n    existing = parseSessionState(stateFile)\n\n    // If the run is completed or failed, clean up and re-init\n    if existing.runId != \"\":\n        statusResult = shell(\"babysitter run:status .a5c/runs/{existing.runId} --json\")\n        if statusResult.exitCode == 0:\n            runStatus = parseJson(statusResult.stdout)\n            if runStatus.state in [\"completed\", \"failed\"]:\n                deleteFile(stateFile)\n                return shell(\"babysitter session:init --session-id {sessionId} --state-dir {stateDir} --json\")\n\n    // Otherwise, offer to resume\n    return { action: \"resume_or_cleanup\", existingRunId: existing.runId }\n```\n\n### Run Directory Missing or Corrupted\n\nIf the run directory is deleted or journal files are corrupted, `run:status` and\n`run:iterate` will return non-zero exit codes. The stop interceptor should APPROVE\nexit in this case (Guard 5 in the decision algorithm).\n\n### Concurrent Sessions on Same Run\n\nThe SDK uses file-based run locking (`run.lock` with PID). If two sessions try to\niterate the same run concurrently, one will fail to acquire the lock. The harness\nshould retry after a short delay (250ms, up to 40 retries) or report the conflict.\n\n### Effect Posted but Journal Not Updated\n\nIf the harness crashes between writing `result.json` and the CLI appending the\n`EFFECT_RESOLVED` journal event, the run may appear stuck. Use `run:repair-journal`\nto detect and fix such inconsistencies:\n\n```bash\nbabysitter run:repair-journal .a5c/runs/{runId} --json\n```\n\n### Zero Pending Tasks After Iterate\n\nIf `run:iterate` returns status=waiting but `task:list --pending` returns zero tasks,\nthis indicates all effects were resolved during the iterate call itself (e.g., via\nreplay). Simply call `run:iterate` again on the next iteration.\n\n---\n\n## 9. Testing the Integration\n\n### Smoke Test Checklist\n\nRun these tests in order. Each builds on the previous.\n\n#### Test 1: CLI Availability\n\n```bash\nbabysitter version --json\n```\n\n- [ ] Exit code is 0\n- [ ] Output contains `\"version\"` field\n\n#### Test 2: Session Initialization\n\n```bash\nbabysitter session:init \\\n  --session-id test-session-001 \\\n  --state-dir /tmp/babysitter-test/state \\\n  --json\n```\n\n- [ ] Exit code is 0\n- [ ] State file created at `/tmp/babysitter-test/state/test-session-001.md`\n- [ ] File contains YAML frontmatter with `active: true`, `iteration: 1`, `run_id: \"\"`\n\n#### Test 3: Run Creation\n\n```bash\n# Create a minimal process file\ncat > /tmp/babysitter-test/process.js << 'EOF'\nexports.process = async function(inputs, ctx) {\n  const result = await ctx.task('greet', { name: inputs.name });\n  return { greeting: result };\n};\nEOF\n\n# Create inputs file\necho '{\"name\": \"World\"}' > /tmp/babysitter-test/inputs.json\n\n# Create the run\nbabysitter run:create \\\n  --process-id test-process \\\n  --entry /tmp/babysitter-test/process.js#process \\\n  --inputs /tmp/babysitter-test/inputs.json \\\n  --prompt \"Test run\" \\\n  --json\n```\n\n- [ ] Exit code is 0\n- [ ] Output contains `\"runId\"` field\n- [ ] Directory `.a5c/runs/{runId}/` exists with `run.json` and `journal/`\n\n#### Test 4: Session Binding\n\n```bash\nbabysitter session:associate \\\n  --session-id test-session-001 \\\n  --run-id {runId} \\\n  --state-dir /tmp/babysitter-test/state \\\n  --json\n```\n\n- [ ] Exit code is 0\n- [ ] State file now has `run_id: \"{runId}\"`\n\n#### Test 5: Run Iterate\n\n```bash\nbabysitter run:iterate .a5c/runs/{runId} --json\n```\n\n- [ ] Exit code is 0\n- [ ] Output contains `\"status\"` field\n\n#### Test 6: Task List\n\n```bash\nbabysitter task:list .a5c/runs/{runId} --pending --json\n```\n\n- [ ] Exit code is 0\n- [ ] Output contains `\"tasks\"` array\n\n#### Test 7: Iteration Guard\n\n```bash\nbabysitter session:check-iteration \\\n  --session-id test-session-001 \\\n  --state-dir /tmp/babysitter-test/state \\\n  --json\n```\n\n- [ ] Exit code is 0\n- [ ] Output contains `\"shouldContinue\": true`\n\n#### Test 8: Iteration Message\n\n```bash\nbabysitter session:iteration-message \\\n  --iteration 2 \\\n  --run-id {runId} \\\n  --runs-dir .a5c/runs \\\n  --json\n```\n\n- [ ] Exit code is 0\n- [ ] Output contains `\"systemMessage\"` field\n- [ ] Output contains `\"iteration\": 2`\n\n### Common Failure Modes\n\n| Symptom | Likely Cause | Fix |\n|---------|-------------|-----|\n| `babysitter: command not found` | SDK not installed or not on PATH | Re-run installation (Section 2a) |\n| Stop hook always APPROVEs | No session state file, or `run_id` is empty | Check session:init and session:associate ran |\n| Infinite loop (never exits) | Completion proof not detected | Check transcript scanning for `<promise>` tags |\n| Exits after 1 iteration | Stop interceptor not wired correctly | Verify BLOCK decision re-injects context |\n| `Session already associated` | Re-entrant run on same session | Clean up old state file or complete old run |\n| Iterations very fast, exits early | Runaway detection triggers (avg <= 15s) | Agent is not doing meaningful work per iteration; check effect execution |\n| State file corrupt | Non-atomic write or concurrent access | Use atomic write protocol (temp + rename) |\n| `task:post` fails | Writing result.json directly instead of via CLI | Always use `babysitter task:post` command |\n| Run stuck in \"waiting\" | Effects executed but results not posted | Check task:post calls after each effect |\n\n### End-to-End Integration Test\n\nThe following pseudocode validates the complete loop:\n\n```\nfunction testEndToEnd():\n    sessionId = \"e2e-test-\" + randomId()\n    pluginRoot = \"/tmp/babysitter-e2e\"\n    stateDir = \"{pluginRoot}/skills/babysit/state\"\n    runsDir = \".a5c/runs\"\n\n    // 1. Init\n    ensureBabysitterCLI()\n    shell(\"babysitter session:init --session-id {sessionId} --state-dir {stateDir} --json\")\n\n    // 2. Create and bind\n    result = shell(\"babysitter run:create --process-id test --entry ./process.js#process --inputs inputs.json --json\")\n    runId = parseJson(result.stdout).runId\n    shell(\"babysitter session:associate --session-id {sessionId} --run-id {runId} --state-dir {stateDir} --json\")\n\n    // 3. Iterate\n    iterResult = shell(\"babysitter run:iterate .a5c/runs/{runId} --json\")\n    assert parseJson(iterResult.stdout).status in [\"executed\", \"waiting\", \"completed\"]\n\n    // 4. Execute effects\n    listResult = shell(\"babysitter task:list .a5c/runs/{runId} --pending --json\")\n    tasks = parseJson(listResult.stdout).tasks\n    for task in tasks:\n        // Execute task, write output\n        shell(\"babysitter task:post .a5c/runs/{runId} {task.effectId} --status ok --value output.json --json\")\n\n    // 5. Check iteration guard\n    guardResult = shell(\"babysitter session:check-iteration --session-id {sessionId} --state-dir {stateDir} --json\")\n    assert parseJson(guardResult.stdout).shouldContinue == true\n\n    // 6. Get iteration message (correct params: --iteration, --run-id, --runs-dir)\n    msgResult = shell(\"babysitter session:iteration-message --iteration 2 --run-id {runId} --runs-dir {runsDir} --json\")\n    assert parseJson(msgResult.stdout).systemMessage is not null\n\n    // 7. Re-iterate until completed\n    iterResult = shell(\"babysitter run:iterate .a5c/runs/{runId} --json\")\n    if parseJson(iterResult.stdout).status == \"completed\":\n        proof = parseJson(iterResult.stdout).completionProof\n        assert proof is not null and proof is not \"\"\n\n    print(\"END-TO-END TEST PASSED\")\n```\n\n---\n\n## 10. Reference Implementation\n\nThe canonical reference implementation is the Claude Code harness adapter, documented at:\n\n**[docs/assimilation/harness/claude-code-integration.md](./claude-code-integration.md)**\n\nKey files in the reference implementation:\n\n| File | Role |\n|------|------|\n| `packages/sdk/src/harness/types.ts` | `HarnessAdapter` interface definition |\n| `packages/sdk/src/harness/claudeCode.ts` | Claude Code adapter (stop hook, session-start, binding) |\n| `packages/sdk/src/harness/nullAdapter.ts` | No-op fallback adapter (useful as a starting template) |\n| `packages/sdk/src/harness/registry.ts` | Adapter auto-detection and lookup |\n| `packages/sdk/src/session/` | Session state parsing, writing, and types |\n| `artifacts/generated-plugins/claude-code/hooks/babysitter-proxied-stop.sh` | Generated Claude Code stop hook entry |\n| `artifacts/generated-plugins/claude-code/hooks/babysitter-proxied-session-start.sh` | Generated Claude Code session-start hook entry |\n\n### Writing a New Harness Adapter\n\nTo add first-class SDK support for your harness, implement the `HarnessAdapter`\ninterface:\n\n```typescript\ninterface HarnessAdapter {\n  readonly name: string;\n  isActive(): boolean;\n  resolveSessionId(parsed: { sessionId?: string }): string | undefined;\n  resolveStateDir(args: { stateDir?: string; pluginRoot?: string }): string | undefined;\n  resolvePluginRoot(args: { pluginRoot?: string }): string | undefined;\n  bindSession(opts: SessionBindOptions): Promise<SessionBindResult>;\n  handleStopHook(args: HookHandlerArgs): Promise<number>;\n  handleSessionStartHook(args: HookHandlerArgs): Promise<number>;\n  findHookDispatcherPath(startCwd: string): string | null;\n}\n```\n\nRegister your adapter in `packages/sdk/src/harness/registry.ts` and it will be\nauto-detected when its `isActive()` method returns `true`.\n\nFor harnesses that cannot modify the SDK source, the entire integration can be\nbuilt externally by calling the babysitter CLI commands documented in this guide.\n\n### Full-Code Example: Minimal Node.js Harness\n\nThe following is a complete, runnable Node.js implementation (not pseudocode) of a\nminimal harness that drives a single babysitter run to completion. It covers session\ninitialization, run creation, the orchestration loop, effect execution for node tasks,\nand completion proof extraction.\n\n```javascript\n#!/usr/bin/env node\n// minimal-harness.js -- A complete minimal babysitter harness implementation.\n// Usage: node minimal-harness.js <process-file>#<export> <inputs.json> [prompt]\n\nconst { execSync } = require('child_process');\nconst { readFileSync, writeFileSync, mkdirSync, existsSync } = require('fs');\nconst { join } = require('path');\nconst crypto = require('crypto');\n\n// --- Configuration ---\nconst RUNS_DIR = '.a5c/runs';\nconst STATE_DIR = join(process.cwd(), '.harness-state');\nconst MAX_ITERATIONS = 65000;\nconst RUNAWAY_THRESHOLD_ITERATIONS = 5;\nconst RUNAWAY_THRESHOLD_SECONDS = 15;\nconst CLI_TIMEOUT_MS = 120_000;\n\n// --- Helpers ---\nfunction cli(command, timeoutMs = CLI_TIMEOUT_MS) {\n  try {\n    const stdout = execSync(`babysitter ${command}`, {\n      encoding: 'utf8',\n      timeout: timeoutMs,\n      stdio: ['pipe', 'pipe', 'pipe'],\n    });\n    return { exitCode: 0, stdout, stderr: '' };\n  } catch (err) {\n    return {\n      exitCode: err.status ?? 1,\n      stdout: err.stdout ?? '',\n      stderr: err.stderr ?? '',\n      timedOut: err.killed === true,\n    };\n  }\n}\n\nfunction cliJson(command, timeoutMs) {\n  const result = cli(`${command} --json`, timeoutMs);\n  if (result.exitCode !== 0) {\n    console.error(`CLI error (${command}): ${result.stderr}`);\n    return null;\n  }\n  try {\n    return JSON.parse(result.stdout);\n  } catch {\n    console.error(`JSON parse error for ${command}: ${result.stdout.slice(0, 200)}`);\n    return null;\n  }\n}\n\n// --- Main ---\nasync function main() {\n  const [,, entryPoint, inputsFile, prompt = 'Run process'] = process.argv;\n  if (!entryPoint || !inputsFile) {\n    console.error('Usage: node minimal-harness.js <entry>#<export> <inputs.json> [prompt]');\n    process.exit(1);\n  }\n\n  const sessionId = `harness-${crypto.randomUUID().slice(0, 8)}`;\n  mkdirSync(STATE_DIR, { recursive: true });\n\n  // Step 1: Verify CLI\n  const version = cliJson('version');\n  if (!version) { console.error('babysitter CLI not available'); process.exit(1); }\n  console.log(`Using babysitter SDK v${version.sdkVersion || version.version}`);\n\n  // Step 2: Session init\n  const initResult = cliJson(\n    `session:init --session-id ${sessionId} --state-dir ${STATE_DIR}`\n  );\n  if (!initResult) { console.error('Session init failed'); process.exit(1); }\n\n  // Step 3: Create run\n  const processId = entryPoint.split('#')[0].replace(/[^a-zA-Z0-9-_]/g, '-');\n  const createResult = cliJson(\n    `run:create --process-id ${processId} --entry ${entryPoint}` +\n    ` --inputs ${inputsFile} --prompt \"${prompt.replace(/\"/g, '\\\\\"')}\"`\n  );\n  if (!createResult) { console.error('Run creation failed'); process.exit(1); }\n  const { runId } = createResult;\n  const runDir = join(RUNS_DIR, runId);\n  console.log(`Created run: ${runId}`);\n\n  // Step 4: Bind session\n  cliJson(\n    `session:associate --session-id ${sessionId} --run-id ${runId} --state-dir ${STATE_DIR}`\n  );\n\n  // Step 5: Orchestration loop\n  const iterationTimes = [];\n  let iteration = 0;\n\n  while (iteration < MAX_ITERATIONS) {\n    iteration++;\n    const iterStart = Date.now();\n    console.log(`\\n--- Iteration ${iteration} ---`);\n\n    // 5a: Iterate\n    const iterData = cliJson(`run:iterate ${runDir}`, CLI_TIMEOUT_MS);\n    if (!iterData) { console.error('run:iterate failed'); break; }\n\n    if (iterData.status === 'completed') {\n      console.log(`Run completed. Proof: ${iterData.completionProof}`);\n      break;\n    }\n    if (iterData.status === 'failed') {\n      console.error('Run failed:', JSON.stringify(iterData, null, 2));\n      break;\n    }\n\n    // 5b: List and execute pending tasks\n    const listData = cliJson(`task:list ${runDir} --pending`);\n    if (!listData || !listData.tasks || listData.tasks.length === 0) {\n      console.log('No pending tasks; re-iterating...');\n      continue;\n    }\n\n    for (const task of listData.tasks) {\n      const taskDir = join(runDir, 'tasks', task.effectId);\n      const taskDefPath = join(taskDir, 'task.json');\n      if (!existsSync(taskDefPath)) {\n        console.error(`task.json missing for ${task.effectId}`);\n        continue;\n      }\n      const taskDef = JSON.parse(readFileSync(taskDefPath, 'utf8'));\n      let result;\n\n      switch (task.kind) {\n        case 'node': {\n          // Execute the node task's script\n          try {\n            const mod = require(taskDef.args.scriptPath);\n            const fn = taskDef.args.exportName ? mod[taskDef.args.exportName] : mod.default || mod;\n            const output = await fn(taskDef.args.input);\n            result = { status: 'ok', value: output };\n          } catch (err) {\n            result = { status: 'error', value: { message: err.message, stack: err.stack } };\n          }\n          break;\n        }\n        case 'breakpoint':\n          // Minimal harness: auto-reject breakpoints (non-interactive)\n          result = { status: 'ok', value: { approved: false, reason: 'Non-interactive harness' } };\n          break;\n        case 'sleep': {\n          const until = taskDef.args.until || taskDef.args.sleepUntil;\n          const durationMs = taskDef.args.durationMs;\n          const target = until ? new Date(until).getTime() : (Date.now() + (durationMs || 0));\n          const remaining = target - Date.now();\n          if (remaining > 0 && remaining <= 60000) {\n            await new Promise(r => setTimeout(r, remaining));\n          }\n          result = { status: 'ok', value: { wokeAt: new Date().toISOString(), reason: 'waited' } };\n          break;\n        }\n        default:\n          result = { status: 'error', value: { message: `Unsupported task kind: ${task.kind}` } };\n      }\n\n      // Post result\n      const outputPath = join(taskDir, 'output.json');\n      writeFileSync(outputPath, JSON.stringify(result.value));\n      cli(`task:post ${runDir} ${task.effectId} --status ${result.status} --value ${outputPath} --json`);\n    }\n\n    // 5c: Runaway detection\n    const iterDuration = (Date.now() - iterStart) / 1000;\n    iterationTimes.push(iterDuration);\n    if (iteration >= RUNAWAY_THRESHOLD_ITERATIONS) {\n      const recent = iterationTimes.slice(-3);\n      const avg = recent.reduce((a, b) => a + b, 0) / recent.length;\n      if (avg <= RUNAWAY_THRESHOLD_SECONDS) {\n        console.error(`Runaway detected: avg ${avg.toFixed(1)}s <= ${RUNAWAY_THRESHOLD_SECONDS}s threshold`);\n        break;\n      }\n    }\n  }\n\n  console.log(`\\nHarness finished after ${iteration} iterations.`);\n}\n\nmain().catch(err => { console.error(err); process.exit(1); });\n```\n\n---\n\n## Appendix: Complete CLI Command Reference\n\n| Command | Purpose | Section |\n|---------|---------|---------|\n| `babysitter version --json` | Verify CLI installation | 2a |\n| `babysitter session:init --session-id ID --state-dir DIR --json` | Create baseline session state | 2b |\n| `babysitter run:create --process-id PID --entry FILE --inputs FILE --json` | Create a new run | 2c |\n| `babysitter run:assign-process RUNDIR --entry FILE [--process-id PID] --json` | Assign process to bare run | 2c |\n| `babysitter session:associate --session-id ID --run-id RID --state-dir DIR --json` | Bind session to run | 2c |\n| `babysitter run:iterate RUNDIR --json` | Advance orchestration, discover effects | 2d, 2e |\n| `babysitter run:status RUNDIR --json` | Read run status and completion proof | 2d |\n| `babysitter session:iteration-message --iteration N [--run-id ID] [--runs-dir DIR] [--plugin-root DIR] --json` | Get context to re-inject after BLOCK | 2d |\n| `babysitter task:list RUNDIR --pending --json` | List pending effects | 2e |\n| `babysitter task:show RUNDIR EFFECTID --json` | Read task definition | 2e |\n| `babysitter task:post RUNDIR EFFECTID --status STATUS --value FILE --json` | Post effect result | 2f |\n| `babysitter session:check-iteration --session-id ID --state-dir DIR --json` | Check iteration guards | 2g |\n| `babysitter run:repair-journal RUNDIR --json` | Repair inconsistent journal | 7 |\n| `babysitter hook:run --hook-type TYPE --harness NAME --json` | Dispatch a lifecycle hook | 5 |\n",
    "documents": []
  },
  "outgoingEdges": [],
  "incomingEdges": [
    {
      "from": "page:docs-assimilation-harness",
      "to": "page:docs-assimilation-harness-generic-harness-guide",
      "kind": "contains_page"
    }
  ]
}

Generic Harness Integration Guide for Babysitter SDK json

Inspect the normalized record payload exactly as the atlas UI reads it.

File · wiki/docs/assimilation/harness/generic-harness-guide.mdCluster · wiki

Record JSON

{
  "id": "page:docs-assimilation-harness-generic-harness-guide",
  "_kind": "Page",
  "_file": "wiki/docs/assimilation/harness/generic-harness-guide.md",
  "_cluster": "wiki",
  "attributes": {
    "nodeKind": "Page",
    "sourcePath": "docs/assimilation/harness/generic-harness-guide.md",
    "sourceKind": "repo-docs",
    "title": "Generic Harness Integration Guide for Babysitter SDK",
    "displayName": "Generic Harness Integration Guide for Babysitter SDK",
    "slug": "docs/assimilation/harness/generic-harness-guide",
    "articlePath": "wiki/docs/assimilation/harness/generic-harness-guide.md",
    "article": "\n# Generic Harness Integration Guide for Babysitter SDK\n\nA step-by-step implementation guide for integrating the babysitter SDK orchestration\nloop into any AI coding harness. This document is harness-agnostic and uses pseudocode\nthroughout. For the canonical reference implementation, see\n[Claude Code Integration](./claude-code-integration.md).\n\n---\n\n## Table of Contents\n\n1. [Prerequisites](#1-prerequisites)\n2. [Core Integration Points](#2-core-integration-points)\n   - [2a. SDK Installation](#2a-sdk-installation)\n   - [2b. Session Initialization](#2b-session-initialization)\n   - [2c. Run Creation and Session Binding](#2c-run-creation-and-session-binding)\n   - [2d. The Orchestration Loop Driver](#2d-the-orchestration-loop-driver)\n   - [2e. Effect Execution](#2e-effect-execution)\n   - [2f. Result Posting](#2f-result-posting)\n   - [2g. Iteration Guards](#2g-iteration-guards)\n3. [Harness Capability Matrix](#3-harness-capability-matrix)\n4. [Session State Contract](#4-session-state-contract)\n5. [Hook Equivalence Table](#5-hook-equivalence-table)\n6. [CLI Output Schemas](#6-cli-output-schemas)\n7. [CLI Error Handling](#7-cli-error-handling)\n8. [Edge Cases](#8-edge-cases)\n9. [Testing the Integration](#9-testing-the-integration)\n10. [Reference Implementation](#10-reference-implementation)\n\n---\n\n## 1. Prerequisites\n\nYour harness must provide (or be able to emulate) the following capabilities before\nyou begin integration. Each item is marked as REQUIRED or RECOMMENDED.\n\n### Checklist\n\n- [ ] **REQUIRED: Shell or script execution** -- The harness must be able to execute\n  shell commands (`bash`, `sh`, `cmd`) or invoke Node.js scripts. The babysitter CLI\n  is a Node.js binary invoked via shell. Every integration point depends on running\n  `babysitter <command>` and reading its JSON output from stdout.\n\n- [ ] **REQUIRED: Exit/stop interception** -- The harness must provide a mechanism to\n  intercept the AI agent's attempt to end its turn or exit the conversation. This is\n  the single most critical requirement. Without it, the orchestration loop cannot\n  function. Examples:\n  - A \"stop hook\" that fires before the agent's response is finalized\n  - A middleware layer that can reject an exit signal and re-inject context\n  - A wrapper around the agent loop that checks a condition before allowing termination\n\n- [ ] **REQUIRED: Context re-injection** -- After blocking an exit, the harness must\n  be able to inject new text (a system message, user message, or tool result) into the\n  agent's context so it continues working. The injected content comes from the\n  babysitter CLI output.\n\n- [ ] **REQUIRED: Session/conversation identity** -- The harness must provide a stable\n  identifier for the current session or conversation. This ID is used to:\n  - Name the session state file\n  - Associate the session with a babysitter run\n  - Track iteration count across stop-hook cycles\n\n- [ ] **RECOMMENDED: Lifecycle hooks** -- Pre-session, post-session, pre-turn,\n  post-turn hooks simplify integration. If unavailable, equivalent behavior can be\n  built by wrapping the agent's main loop.\n\n- [ ] **RECOMMENDED: Transcript access** -- Access to the agent's recent output text\n  enables completion proof verification (scanning for `<promise>` tags). If\n  unavailable, an alternative proof mechanism must be implemented (see Section 2d).\n\n- [ ] **RECOMMENDED: Persistent environment** -- Environment variables or a key-value\n  store that persists across hook invocations within the same session. Used to carry\n  the session ID and plugin root path.\n\n### Minimum Environment\n\n```\nNode.js >= 18\nnpm >= 8\nFile system access (read/write to working directory)\n```\n\n---\n\n## 2. Core Integration Points\n\nImplement these in order. Each section includes a checklist, pseudocode, and the\nspecific CLI commands involved.\n\n```\n+-----------------------------------------------------------------------+\n|                         YOUR HARNESS                                  |\n|                                                                       |\n|  +------------------+  +-----------------------+  +----------------+  |\n|  | Session Lifecycle |  | Exit/Stop Interceptor |  | Agent Loop     |  |\n|  | (start/end)      |  | (block or approve)    |  | (LLM turns)   |  |\n|  +--------+---------+  +-----------+-----------+  +-------+--------+  |\n|           |                        |                      |           |\n+-----------------------------------------------------------------------+\n            |                        |                      |\n            v                        v                      v\n   +-------------------+   +-------------------+   +------------------+\n   | babysitter CLI    |   | Session State     |   | Run Directory    |\n   | (npm package)     |   | {stateDir}/       |   | .a5c/runs/{id}/  |\n   |                   |   |   {sessionId}.md  |   |   journal/       |\n   | session:init      |   +-------------------+   |   tasks/         |\n   | run:create        |                           |   state/         |\n   | run:assign-process|                           +------------------+\n   | session:associate |\n   | run:iterate       |\n   | task:list         |\n   | task:post         |\n   | session:check-    |\n   |   iteration       |\n   | session:iteration-|\n   |   message         |\n   | run:status        |\n   +-------------------+\n```\n\n---\n\n### 2a. SDK Installation\n\n**Goal:** Ensure the `babysitter` CLI binary is available on PATH.\n\n#### Checklist\n\n- [ ] Determine the SDK version to install (from plugin manifest or pinned version)\n- [ ] Attempt global install; fall back to local prefix install; fall back to npx\n- [ ] Gate installation behind a marker file to avoid repeated install attempts\n- [ ] Verify the CLI is callable: `babysitter version --json`\n\n#### Pseudocode\n\n```\nfunction ensureBabysitterCLI(sdkVersion):\n    markerFile = \"{pluginRoot}/.babysitter-install-attempted\"\n\n    if commandExists(\"babysitter\"):\n        return \"babysitter\"\n\n    if fileExists(markerFile):\n        // Already tried installing; fall through to npx\n    else:\n        // Attempt global install\n        result = shell(\"npm install -g @a5c-ai/babysitter-sdk@{sdkVersion}\")\n        if result.exitCode != 0:\n            // Fallback: install with local prefix\n            result = shell(\"npm install -g @a5c-ai/babysitter-sdk@{sdkVersion} --prefix $HOME/.local\")\n        writeFile(markerFile, \"attempted\")\n\n    if commandExists(\"babysitter\"):\n        return \"babysitter\"\n\n    // Final fallback: use npx on every invocation\n    return \"npx -y @a5c-ai/babysitter-sdk@{sdkVersion} babysitter\"\n```\n\n#### CLI Command\n\n```bash\n# Verify installation\nbabysitter version --json\n# Expected: { \"version\": \"x.y.z\", \"sdkVersion\": \"...\" }\n```\n\n---\n\n### 2b. Session Initialization\n\n**Goal:** Create a baseline session state file so the orchestration loop can track\niterations from the very start of the session, even before any run is created.\n\n#### When to Call\n\nAt session/conversation start -- before the user has issued any commands. This is\ntypically wired into a \"session start\" lifecycle hook or called at the top of the\nagent's main loop.\n\n#### Checklist\n\n- [ ] Obtain or generate a unique session ID\n- [ ] Determine the state directory (typically `{pluginRoot}/skills/babysit/state/`)\n- [ ] Call `babysitter session:init`\n- [ ] Persist the session ID in the harness environment for later hook invocations\n\n#### Pseudocode\n\n```\nfunction onSessionStart(sessionId, pluginRoot):\n    stateDir = \"{pluginRoot}/skills/babysit/state\"\n    ensureDirectoryExists(stateDir)\n\n    result = shell(\n        \"babysitter session:init\" +\n        \" --session-id {sessionId}\" +\n        \" --state-dir {stateDir}\" +\n        \" --json\"\n    )\n\n    if result.exitCode != 0:\n        log(\"WARNING: session init failed, orchestration may not work\")\n        return\n\n    // Persist session ID for use by the stop interceptor\n    setEnv(\"AGENT_SESSION_ID\", sessionId)\n    setEnv(\"BABYSITTER_PLUGIN_ROOT\", pluginRoot)\n```\n\n#### What This Creates\n\nA session state file at `{stateDir}/{sessionId}.md` in BASELINE state (empty\n`run_id`, `iteration: 1`). See [Section 4: Session State Contract](#4-session-state-contract)\nfor the full file format, field definitions, and state transition diagram.\n\n---\n\n### 2c. Run Creation and Session Binding\n\n**Goal:** Create a babysitter run and bind it to the current session so the stop\ninterceptor knows which run to check.\n\n#### When to Call\n\nAfter the user requests a task that should be orchestrated. Typically triggered by a\nskill or command within the harness (e.g., the user says \"babysit this task\").\n\n#### Checklist\n\n- [ ] Prepare the process definition (entry point, process ID, inputs)\n- [ ] Call `babysitter run:create` with harness and session parameters\n- [ ] Call `babysitter session:associate` to bind the run to the session\n- [ ] Verify the session state file now has a non-empty `run_id`\n\n#### Pseudocode\n\n```\nfunction createAndBindRun(processId, entryPoint, inputs, prompt, sessionId, pluginRoot):\n    // Step 1: Create the run\n    createResult = shell(\n        \"babysitter run:create\" +\n        \" --process-id {processId}\" +\n        \" --entry {entryPoint}\" +\n        \" --inputs {inputsFilePath}\" +\n        \" --prompt \\\"{prompt}\\\"\" +\n        \" --json\"\n    )\n    runId = parseJson(createResult.stdout).runId\n    runDir = \".a5c/runs/{runId}\"\n\n    // Step 2: Bind session to run\n    shell(\n        \"babysitter session:associate\" +\n        \" --session-id {sessionId}\" +\n        \" --run-id {runId}\" +\n        \" --state-dir {pluginRoot}/skills/babysit/state\" +\n        \" --json\"\n    )\n\n    return { runId, runDir }\n```\n\n#### Re-entrant Run Prevention\n\nIf the session is already bound to a different run, `session:associate` will fail.\nThe harness must either:\n1. Complete or clean up the existing run first\n2. Remove the old session state file manually\n3. Present an error to the user\n\n---\n\n### 2d. The Orchestration Loop Driver\n\n**Goal:** Convert the agent's single-turn execution into a multi-iteration\norchestration loop by intercepting exit signals, checking run status, and re-injecting\ncontext.\n\nThis is the most critical and complex integration point.\n\n#### Architecture\n\n```\nAgent executes turn\n     |\n     v\nAgent signals \"done\" (stop/exit)\n     |\n     v\n+--[EXIT INTERCEPTOR]----------------------------------------------+\n|  1. Read session state file                                      |\n|  2. Check guards (max iterations, runaway detect, no run bound)  |\n|  3. Load run status via run:status                               |\n|  4. Check completion proof                                       |\n|  5. Decision: APPROVE or BLOCK                                   |\n+--------+-------------------+-------------------------------------+\n         |                   |\n    [APPROVE]           [BLOCK]\n         |                   |\n         v                   v\n    Session ends     Re-inject context ------> Agent continues\n                     (iteration message)       (back to top)\n```\n\n#### The Decision Algorithm\n\n```\nfunction onAgentStop(sessionId, pluginRoot, runsDir, lastAgentOutput):\n    stateDir = \"{pluginRoot}/skills/babysit/state\"\n    stateFile = \"{stateDir}/{sessionId}.md\"\n\n    // --- Guard 1: No state file means no active loop ---\n    if not fileExists(stateFile):\n        return APPROVE\n\n    state = parseSessionState(stateFile)\n\n    // --- Guard 2: Max iterations ---\n    if state.iteration >= state.maxIterations:\n        cleanupSessionFile(stateFile)\n        return APPROVE\n\n    // --- Guard 3: Runaway loop detection ---\n    if state.iteration >= 5:\n        avgTime = average(state.iterationTimes)  // last 3 durations\n        if avgTime <= 15:  // seconds\n            cleanupSessionFile(stateFile)\n            return APPROVE\n\n    // --- Guard 4: No run bound ---\n    if state.runId == \"\":\n        cleanupSessionFile(stateFile)\n        return APPROVE\n\n    // --- Check run status ---\n    statusResult = shell(\n        \"babysitter run:status .a5c/runs/{state.runId} --json\"\n    )\n    runStatus = parseJson(statusResult.stdout)\n\n    // --- Guard 5: Unknown or unreadable run ---\n    if statusResult.exitCode != 0:\n        cleanupSessionFile(stateFile)\n        return APPROVE\n\n    // --- Guard 6: Completion proof ---\n    if runStatus.state == \"completed\":\n        proof = runStatus.completionProof\n        promiseTag = extractPromiseTag(lastAgentOutput)\n        if promiseTag == proof:\n            cleanupSessionFile(stateFile)\n            return APPROVE\n\n    // --- BLOCK: Continue the loop ---\n    // Advance session state from BOUND/ACTIVE to next iteration.\n    // See Section 4 (Session State Contract) for field update rules\n    // and the atomic write protocol.\n    newIteration = state.iteration + 1\n    updateSessionState(stateFile, {\n        iteration: newIteration,\n        lastIterationAt: now()\n    })\n\n    // Build the context message to re-inject\n    // NOTE: session:iteration-message uses --iteration, --run-id,\n    //       --runs-dir, and --plugin-root (NOT --session-id or --state-dir)\n    iterationMessage = shell(\n        \"babysitter session:iteration-message\" +\n        \" --iteration {newIteration}\" +\n        \" --run-id {state.runId}\" +\n        \" --runs-dir {runsDir}\" +\n        \" --plugin-root {pluginRoot}\" +\n        \" --json\"\n    )\n\n    return BLOCK {\n        reason: parseJson(iterationMessage.stdout).systemMessage,\n        systemMessage: \"Babysitter iteration {newIteration}/{state.maxIterations}\"\n    }\n```\n\n#### Intercepting Exit Signals\n\nThe mechanism depends entirely on your harness. Common patterns:\n\n| Harness Type | Interception Mechanism |\n|-------------|------------------------|\n| Hook-based (Claude Code, etc.) | Register a `Stop` hook that receives agent output and returns block/approve |\n| Middleware-based | Wrap the agent loop's exit check in a middleware that calls the decision algorithm |\n| Event-based | Listen for \"agent_turn_complete\" events, cancel and re-queue if BLOCK |\n| Loop-based | Replace the `while (running)` loop condition with the decision algorithm |\n| API-based | Between API calls, run the check and decide whether to make another call |\n\n#### Re-injecting Context\n\nAfter blocking, the harness must feed the orchestration context back to the agent.\nThe mechanism depends on your harness:\n\n| Harness Type | Re-injection Mechanism |\n|-------------|------------------------|\n| System message injection | Append the `reason` as a system message before the next turn |\n| User message simulation | Insert a synthetic user message containing the iteration context |\n| Tool result injection | Return the context as a tool call result |\n| Context window prepend | Prepend the context to the agent's next input |\n\nThe content to inject comes from the `systemMessage` field of the\n`session:iteration-message` output. It typically contains:\n1. Iteration number and status\n2. What to do next (run:iterate, execute effects, extract proof, etc.)\n3. Pending effect kinds if the run is in \"waiting\" state\n\n#### Detecting the Completion Proof\n\nThe completion proof is a SHA-256 hash that the agent must output inside\n`<promise>...</promise>` tags. The harness must:\n\n1. Scan the agent's last output for `<promise>VALUE</promise>`\n2. Compare VALUE against the `completionProof` from `run:status --json`\n3. If they match, allow exit\n\n```\nfunction extractPromiseTag(text):\n    match = regex_search(text, \"<promise>([\\\\s\\\\S]*?)</promise>\")\n    if match is null:\n        return null\n    return trim(match.group(1)).replace(/\\\\s+/, \" \")\n```\n\nIf the harness cannot access the agent's output text (no transcript), alternative\napproaches:\n- Have the agent call a special \"complete\" tool that the harness intercepts\n- Use a dedicated CLI command that the agent calls to signal completion\n- Implement a \"completion callback\" webhook\n\n---\n\n### 2e. Effect Execution\n\n**Goal:** Execute the pending tasks that the babysitter run has requested, then post\ntheir results.\n\n#### The Effect Execution Cycle\n\n```\nbabysitter run:iterate .a5c/runs/{runId} --json\n        |\n        v\n  Returns: { status, pendingActions[], ... }\n        |\n        v\nbabysitter task:list .a5c/runs/{runId} --pending --json\n        |\n        v\n  Returns: { tasks: [{ effectId, taskId, kind, status, label, ... }] }\n        |\n        v\n  For each pending task:\n        |\n        +--[kind = \"node\"]----------> Execute Node.js script\n        |\n        +--[kind = \"breakpoint\"]----> Present to user for approval\n        |\n        +--[kind = \"sleep\"]---------> Wait until specified time\n        |\n        +--[kind = \"orchestrator_  -> Delegate to a sub-agent or\n        |    task\"]                   orchestrator within your harness\n        |\n        +--[kind = \"agent\"]---------> Delegate to an agent subprocess\n        |\n        +--[custom kind]------------> Handle per your harness capabilities\n        |\n        v\n  Post result via task:post (Section 2f)\n```\n\n#### Effect Result Type\n\nAll effect handlers must return a result conforming to this structure (or the\nsentinel `DEFERRED` for effects that will be resolved later):\n\n```\nEffectResult = {\n    status: \"ok\" | \"error\",\n    value: object          // Payload specific to the effect kind\n}\n\n// For node tasks:\n//   { status: \"ok\", value: <return value of the Node.js function> }\n//   { status: \"error\", value: { message: string, stack?: string } }\n\n// For breakpoints:\n//   { status: \"ok\", value: { approved: boolean, approvedBy?: string, reason?: string } }\n\n// For sleep:\n//   { status: \"ok\", value: { wokeAt: string (ISO 8601), reason: string } }\n\n// For orchestrator_task:\n//   { status: \"ok\", value: { output: any, completedAt: string } }\n//   { status: \"error\", value: { message: string, phase?: string } }\n\n// For agent:\n//   { status: \"ok\", value: { response: string, tokensUsed?: number } }\n//   { status: \"error\", value: { message: string, exitCode?: number } }\n```\n\n#### Pseudocode\n\n```\nfunction executeEffects(runId):\n    runDir = \".a5c/runs/{runId}\"\n\n    // Step 1: Iterate to discover pending effects\n    iterResult = shell(\"babysitter run:iterate {runDir} --json\")\n    if iterResult.exitCode != 0:\n        handleCLIError(\"run:iterate\", iterResult)\n        return\n\n    iterData = parseJson(iterResult.stdout)\n\n    if iterData.status == \"completed\":\n        // Run is done -- extract proof and output it\n        proof = iterData.completionProof\n        agentOutput(\"<promise>{proof}</promise>\")\n        return\n\n    if iterData.status == \"failed\":\n        // Inspect error, attempt recovery\n        return\n\n    // Step 2: List pending tasks\n    listResult = shell(\"babysitter task:list {runDir} --pending --json\")\n    if listResult.exitCode != 0:\n        handleCLIError(\"task:list\", listResult)\n        return\n\n    tasks = parseJson(listResult.stdout).tasks\n\n    // Step 3: Execute each task\n    for task in tasks:\n        taskDir = \"{runDir}/tasks/{task.effectId}\"\n        taskDef = readJson(\"{taskDir}/task.json\")\n\n        switch task.kind:\n            case \"node\":\n                result = executeNodeTask(taskDef)\n            case \"breakpoint\":\n                result = handleBreakpoint(taskDef)\n            case \"sleep\":\n                result = handleSleep(taskDef)\n            case \"orchestrator_task\":\n                result = handleOrchestratorTask(taskDef)\n            case \"agent\":\n                result = handleAgentTask(taskDef)\n            default:\n                result = handleCustomKind(task.kind, taskDef)\n\n        // Step 4: Post result (skip deferred effects like long sleeps)\n        if result != DEFERRED:\n            postResult(runId, task.effectId, result)\n```\n\n#### Breakpoint Effect Handler\n\nBreakpoints are human approval gates. The process pauses until a human explicitly\napproves or rejects the breakpoint. **Never auto-approve breakpoints** -- they exist\nspecifically to require human judgment.\n\n```\nfunction handleBreakpoint(taskDef):\n    // taskDef.args schema:\n    //   {\n    //     message?: string,         // Human-readable description of what needs approval\n    //     description?: string,     // Alternative to message (checked as fallback)\n    //     context?: {\n    //       changedFiles?: string[],  // Files modified since last breakpoint\n    //       summary?: string,         // Summary of work done so far\n    //       risks?: string[],         // Identified risks requiring human review\n    //       [key: string]: unknown    // Additional context from the process\n    //     },\n    //     requireExplicitApproval?: boolean,  // If true, never auto-approve (default: true)\n    //     blocking?: boolean          // If true, the run cannot proceed without resolution (default: true)\n    //   }\n\n    message = taskDef.args.message or taskDef.args.description or \"Approval required\"\n    context = taskDef.args.context or {}\n    requireExplicit = taskDef.args.requireExplicitApproval != false  // default true\n\n    // Present to user via your harness's interactive prompt mechanism\n    if harnessSupportsInteractivePrompt():\n        // Build a rich prompt with context if available\n        promptBody = message\n        if context.summary:\n            promptBody += \"\\n\\nSummary: \" + context.summary\n        if context.risks and length(context.risks) > 0:\n            promptBody += \"\\n\\nRisks:\\n\" + join(context.risks, \"\\n- \")\n        if context.changedFiles and length(context.changedFiles) > 0:\n            promptBody += \"\\n\\nChanged files:\\n\" + join(context.changedFiles, \"\\n- \")\n\n        userDecision = promptUser(\n            title: \"Babysitter Breakpoint\",\n            message: promptBody,\n            options: [\"approve\", \"reject\"]\n        )\n\n        if userDecision == \"approve\":\n            return { status: \"ok\", value: { approved: true, approvedBy: \"user\" } }\n        else:\n            return { status: \"ok\", value: { approved: false, reason: \"User rejected\" } }\n\n    // Non-interactive fallback: reject with explanation\n    // The agent will see this and can inform the user\n    return {\n        status: \"ok\",\n        value: {\n            approved: false,\n            reason: \"Non-interactive environment; breakpoint requires manual approval\"\n        }\n    }\n```\n\n#### Sleep Effect Handler\n\nSleep effects pause execution until a specified time. The harness must decide whether\nto block (wait inline) or defer (post result later).\n\n```\nfunction handleSleep(taskDef):\n    // taskDef.args schema:\n    //   {\n    //     until?: string,          // ISO 8601 timestamp to sleep until\n    //     sleepUntil?: string,     // Alias for 'until'\n    //     durationMs?: number,     // Duration in milliseconds (alternative to until)\n    //     reason?: string          // Human-readable reason for the sleep\n    //   }\n    //\n    // Exactly one of (until | sleepUntil) or durationMs should be provided.\n    // If both are present, the absolute timestamp (until/sleepUntil) takes precedence.\n\n    sleepUntil = taskDef.args.until or taskDef.args.sleepUntil\n    durationMs = taskDef.args.durationMs\n\n    if sleepUntil:\n        targetTime = parseISO8601(sleepUntil)\n    else if durationMs:\n        targetTime = now() + durationMs\n    else:\n        // No target time specified; resolve immediately\n        return { status: \"ok\", value: { wokeAt: now(), reason: \"no_target_time\" } }\n\n    remainingMs = targetTime - now()\n\n    if remainingMs <= 0:\n        // Sleep time already passed\n        return { status: \"ok\", value: { wokeAt: now(), reason: \"already_elapsed\" } }\n\n    if remainingMs <= 60000:  // 1 minute or less\n        // Short sleep: block inline\n        sleep(remainingMs)\n        return { status: \"ok\", value: { wokeAt: now(), reason: \"waited\" } }\n\n    // Long sleep: post a deferred result\n    // Option A: Schedule a timer/cron to post the result later\n    scheduleDelayedPost(runId, effectId, targetTime)\n    // Do NOT post result now -- let the orchestration loop handle it\n    // on the next iteration after the timer fires\n    return DEFERRED  // signal to caller: do not post result yet\n```\n\n#### Orchestrator Task Effect Handler\n\nOrchestrator tasks delegate a sub-process to an orchestrator or sub-agent within\nyour harness. The task definition contains a prompt, optional inputs, and\nconfiguration for the sub-process.\n\n```\nfunction handleOrchestratorTask(taskDef):\n    // taskDef.args schema:\n    //   {\n    //     prompt: string,           // The instruction for the sub-agent\n    //     processId?: string,       // Optional sub-process ID to invoke\n    //     inputs?: object,          // Inputs to pass to the sub-process\n    //     constraints?: {\n    //       maxIterations?: number, // Iteration limit for the sub-process\n    //       timeout?: number        // Timeout in ms for the sub-process\n    //     }\n    //   }\n\n    prompt = taskDef.args.prompt\n    inputs = taskDef.args.inputs or {}\n    constraints = taskDef.args.constraints or {}\n    timeout = constraints.timeout or 900000  // default 15 min\n\n    if harnessSupportsSubAgentDelegation():\n        // Delegate to a sub-agent or internal orchestrator\n        subResult = delegateToSubAgent({\n            prompt: prompt,\n            inputs: inputs,\n            maxIterations: constraints.maxIterations or 50,\n            timeout: timeout\n        })\n\n        if subResult.success:\n            return {\n                status: \"ok\",\n                value: {\n                    output: subResult.output,\n                    completedAt: now()\n                }\n            }\n        else:\n            return {\n                status: \"error\",\n                value: {\n                    message: subResult.error,\n                    phase: subResult.failedPhase or \"execution\"\n                }\n            }\n\n    // Fallback: execute as a simple prompt-response if no sub-agent support\n    // This is a degraded mode -- the harness loses multi-step orchestration\n    response = executePromptSingleTurn(prompt, inputs)\n    return {\n        status: \"ok\",\n        value: {\n            output: response,\n            completedAt: now()\n        }\n    }\n```\n\n#### Agent Effect Handler\n\nAgent effects delegate work to a standalone agent subprocess. Unlike\norchestrator_task, the agent effect expects a self-contained agent invocation\n(typically a CLI tool or API call) that runs to completion.\n\n```\nfunction handleAgentTask(taskDef):\n    // taskDef.args schema:\n    //   {\n    //     command: string,          // Agent command or prompt\n    //     workingDir?: string,      // Working directory for the agent\n    //     env?: Record<string, string>,  // Environment variables\n    //     timeout?: number,         // Timeout in ms (default: 900000)\n    //     captureOutput?: boolean   // Whether to capture stdout/stderr (default: true)\n    //   }\n\n    command = taskDef.args.command\n    workingDir = taskDef.args.workingDir or getCwd()\n    env = taskDef.args.env or {}\n    timeout = taskDef.args.timeout or 900000  // default 15 min\n\n    if harnessSupportsAgentSubprocess():\n        // Spawn agent subprocess\n        agentResult = spawnAgent({\n            command: command,\n            workingDir: workingDir,\n            env: env,\n            timeout: timeout\n        })\n\n        if agentResult.exitCode == 0:\n            return {\n                status: \"ok\",\n                value: {\n                    response: agentResult.stdout,\n                    tokensUsed: agentResult.tokensUsed or null\n                }\n            }\n        else:\n            return {\n                status: \"error\",\n                value: {\n                    message: agentResult.stderr or \"Agent exited with code {agentResult.exitCode}\",\n                    exitCode: agentResult.exitCode\n                }\n            }\n\n    // Fallback: treat as a shell command\n    shellResult = shell(command, { cwd: workingDir, env: env, timeout: timeout })\n    if shellResult.exitCode == 0:\n        return { status: \"ok\", value: { response: shellResult.stdout } }\n    else:\n        return {\n            status: \"error\",\n            value: { message: shellResult.stderr, exitCode: shellResult.exitCode }\n        }\n```\n\n#### Reading Task Definitions\n\nEach pending task has a `task.json` in its effect directory:\n\n```\n.a5c/runs/{runId}/tasks/{effectId}/task.json\n```\n\nThe task definition contains the task kind, arguments, labels, and other metadata\nneeded to execute it. Read it with:\n\n```bash\nbabysitter task:show .a5c/runs/{runId} {effectId} --json\n```\n\n---\n\n### 2f. Result Posting\n\n**Goal:** Record effect execution results back into the run journal.\n\n#### IMPORTANT\n\nAlways post results through the CLI. Never write `result.json` directly. The CLI\ncommand handles:\n1. Writing `result.json` with the correct schema version\n2. Appending an `EFFECT_RESOLVED` event to the journal\n3. Updating the state cache\n\n#### Pseudocode\n\n```\nfunction postResult(runId, effectId, result):\n    runDir = \".a5c/runs/{runId}\"\n    taskDir = \"{runDir}/tasks/{effectId}\"\n\n    // Write the result value to a temporary file\n    valueFile = \"{taskDir}/output.json\"\n    writeJson(valueFile, result.value)\n\n    // Post through the CLI\n    shell(\n        \"babysitter task:post {runDir} {effectId}\" +\n        \" --status {result.status}\" +   // \"ok\" or \"error\"\n        \" --value {valueFile}\" +\n        \" --json\"\n    )\n```\n\n#### CLI Command\n\n```bash\n# Success case\nbabysitter task:post .a5c/runs/{runId} {effectId} \\\n  --status ok \\\n  --value tasks/{effectId}/output.json \\\n  --json\n\n# Error case\nbabysitter task:post .a5c/runs/{runId} {effectId} \\\n  --status error \\\n  --value tasks/{effectId}/error.json \\\n  --json\n```\n\n#### Result Status Values\n\n| Status | Meaning |\n|--------|---------|\n| `ok` | Task completed successfully; value contains the result |\n| `error` | Task failed; value contains error details |\n\n---\n\n### 2g. Iteration Guards\n\n**Goal:** Prevent infinite loops and detect runaway behavior.\n\n#### CLI Command\n\n```bash\nbabysitter session:check-iteration \\\n  --session-id {sessionId} \\\n  --state-dir {stateDir} \\\n  --json\n```\n\n#### Output\n\nSee [Section 6: CLI Output Schemas](#session-check-iteration-output) for the full\nschema. Summary:\n\n- `shouldContinue: true` -- safe to proceed; `nextIteration` indicates the next number\n- `shouldContinue: false` -- stop the loop; `reason` explains why (e.g.,\n  `max_iterations_reached`, `session_not_found`)\n\n#### Guard Logic\n\n```\nfunction checkIterationGuards(sessionId, stateDir):\n    result = shell(\n        \"babysitter session:check-iteration\" +\n        \" --session-id {sessionId}\" +\n        \" --state-dir {stateDir}\" +\n        \" --json\"\n    )\n    data = parseJson(result.stdout)\n\n    if not data.found:\n        return { shouldContinue: false, reason: \"no_session\" }\n\n    if not data.shouldContinue:\n        return { shouldContinue: false, reason: data.reason }\n\n    return { shouldContinue: true, nextIteration: data.nextIteration }\n```\n\n#### Two Detection Mechanisms\n\n**1. Max Iterations Guard**\n\n```\nIF iteration >= maxIterations (default 65000):\n    STOP -- allow exit, clean up state file\n```\n\n**2. Runaway Speed Guard**\n\n```\nIF iteration >= 5:\n    avgDuration = average(last 3 iteration durations)\n    IF avgDuration <= 15 seconds:\n        STOP -- iterations are too fast, likely a runaway loop\n```\n\nThe iteration duration is measured as the wall-clock time between consecutive\nstop-hook invocations. Durations below 15 seconds on average (after at least 5\niterations) indicate the agent is not doing meaningful work.\n\n**Threshold justifications:**\n\n- **Why iteration >= 5:** The first few iterations are often fast because the\n  agent is reading instructions, creating the run, and performing lightweight\n  setup. A minimum of 5 iterations avoids false positives during this bootstrap\n  phase while still catching runaways before significant resource waste. Empirical\n  testing across Claude Code sessions showed that legitimate fast iterations\n  (setup, binding, first iterate) are consistently complete within 3-4 cycles.\n\n- **Why average <= 15 seconds:** A meaningful agent iteration -- one that reads\n  files, calls an LLM, writes code, or executes tests -- typically takes 30-120\n  seconds. The 15-second threshold provides a 2x safety margin below the minimum\n  expected productive iteration time. Iterations under 15 seconds typically\n  indicate the agent is stuck in a loop where it reads the iteration message,\n  does no substantive work, and immediately signals completion. The 3-iteration\n  rolling average (rather than a single iteration) smooths out one-off fast\n  iterations caused by cached replay or quick task:post calls.\n\n- **Tuning:** Both thresholds can be adjusted for specific harness environments.\n  If your agent performs very lightweight iterations (e.g., posting pre-computed\n  results), lower the speed threshold. If your setup phase is longer, raise the\n  minimum iteration count. The `session:check-iteration` CLI command applies\n  these same thresholds internally.\n\n---\n\n## 3. Harness Capability Matrix\n\n### Required vs Optional Capabilities\n\n| Capability | Required | Purpose |\n|---|---|---|\n| Shell command execution | YES | All CLI interactions |\n| Exit/stop interception | YES | Core loop driver |\n| Context re-injection | YES | Continue agent after BLOCK |\n| Session identity | YES | State file naming, run binding |\n| File system read/write | YES | State files, task artifacts |\n| Transcript access | NO * | Completion proof via `<promise>` tag |\n| Lifecycle hooks | NO | Simplifies wiring; can be emulated |\n| Persistent environment | NO | Convenience; can pass via files |\n| Interactive user prompts | NO | Breakpoint handling (non-interactive mode is fallback) |\n| Sub-agent delegation | NO | orchestrator_task / agent effects |\n\n\\* If transcript access is unavailable, an alternative completion signaling mechanism\nmust be implemented.\n\n### Integration Tiers\n\n#### Tier 1: Minimum Viable Integration\n\nSupports basic orchestration with node tasks and completion detection.\n\n- [ ] SDK installation\n- [ ] Session initialization\n- [ ] Run creation and binding\n- [ ] Exit interception with BLOCK/APPROVE\n- [ ] `run:iterate` calls\n- [ ] `task:list --pending` to discover effects\n- [ ] Node task execution\n- [ ] `task:post` to record results\n- [ ] Completion proof detection (via transcript or alternative)\n- [ ] Max iteration guard\n\n#### Tier 2: Robust Integration\n\nAdds safety guards and breakpoint support.\n\n- [ ] Everything in Tier 1\n- [ ] Runaway loop detection (iteration speed guard)\n- [ ] `session:check-iteration` calls\n- [ ] Interactive breakpoint handling\n- [ ] Sleep effect handling\n- [ ] Journal event recording for debugging\n\n#### Tier 3: Full Integration\n\nComplete feature parity with the Claude Code reference implementation.\n\n- [ ] Everything in Tier 2\n- [ ] Native lifecycle hooks (on-run-start, on-task-complete, etc.)\n- [ ] Hook discovery (per-repo, per-user, plugin directories)\n- [ ] Orchestrator task delegation\n- [ ] Agent task delegation\n- [ ] Quality scoring via on-score hooks\n- [ ] Skill discovery and injection\n- [ ] Non-interactive breakpoint auto-resolution\n\n---\n\n## 4. Session State Contract\n\n### File Format\n\nSession state files use Markdown with YAML frontmatter. The frontmatter stores\nmachine-readable state. The Markdown body stores the user's original prompt.\n\n**Path convention:** `{stateDir}/{sessionId}.md`\n\n#### Example\n\n```markdown\n---\nactive: true\niteration: 3\nmax_iterations: 65000\nrun_id: \"my-run-abc123\"\nstarted_at: \"2026-03-02T10:00:00Z\"\nlast_iteration_at: \"2026-03-02T10:05:30Z\"\niteration_times: 45,62,58\n---\n\nBuild a REST API with authentication and rate limiting for the user service.\n```\n\n### Required Fields\n\n| Field | Type | Default | Description |\n|-------|------|---------|-------------|\n| `active` | boolean | `true` | Whether the orchestration loop is active |\n| `iteration` | number | `1` | Current iteration (1-based) |\n| `max_iterations` | number | `65000` | Maximum iterations (0 = unlimited) |\n| `run_id` | string | `\"\"` | Bound run ID (empty before run:create) |\n| `started_at` | string (ISO 8601) | now | Session start timestamp |\n| `last_iteration_at` | string (ISO 8601) | now | Last iteration timestamp |\n| `iteration_times` | string (CSV) | (empty) | Last 3 iteration durations in seconds |\n\n### State Transitions\n\n```\n    CREATE                BIND               ITERATE (x N)        COMPLETE\n  (session:init)    (session:associate)    (stop hook BLOCK)    (stop hook APPROVE)\n       |                   |                     |                    |\n       v                   v                     v                    v\n  +-----------+     +-----------+          +-----------+       +------------+\n  |  BASELINE |     |   BOUND   |          |  ACTIVE   |       |  CLEANED   |\n  |           |---->|           |--------->|           |------>|   UP       |\n  | runId=\"\"  |     | runId=X   |          | iter=N+1  |       | file       |\n  | iter=1    |     | iter=1    |          | times=[.] |       | inactive   |\n  +-----------+     +-----------+          +-----------+       +------------+\n```\n\n### Atomic Write Protocol\n\nSession state files must be written atomically to prevent corruption from\nconcurrent reads during stop-hook evaluation:\n\n```\n1. Write content to temp file: {filePath}.tmp.{pid}\n2. Atomic rename: rename(tempFile, targetFile)\n3. On error: delete temp file\n```\n\n### Timing Calculation\n\n```\nfunction updateIterationTimes(existingTimes, lastIterationAt, currentTime):\n    durationSeconds = (currentTime - lastIterationAt) / 1000\n    if durationSeconds <= 0:\n        return existingTimes\n    newTimes = append(existingTimes, durationSeconds)\n    return lastN(newTimes, 3)   // keep only last 3\n```\n\n---\n\n## 5. Hook Equivalence Table\n\nThe babysitter SDK and harness integration involve two categories of hooks:\n\n### SDK Hooks (13 `KnownHookType` values)\n\nThese are dispatched by the SDK runtime during orchestration. They are defined in\n`packages/sdk/src/hooks/types.ts` and fired via `callHook(hookType, payload)`.\n\n| SDK Hook | Tier | Description |\n|---|---|---|\n| `on-run-start` | 3 | Fires after `run:create` completes |\n| `on-run-complete` | 3 | Fires when `run:iterate` returns status=completed |\n| `on-run-fail` | 3 | Fires when `run:iterate` returns status=failed |\n| `on-task-start` | 3 | Fires before executing each pending effect |\n| `on-task-complete` | 3 | Fires after `task:post` completes |\n| `on-step-dispatch` | 3 | Fires when `run:iterate` discovers a new effect |\n| `on-iteration-start` | 2 | Fires before calling `run:iterate` |\n| `on-iteration-end` | 2 | Fires after all effects for an iteration are posted |\n| `on-breakpoint` | 2 | Fires when a breakpoint effect is pending; present to user for approval |\n| `on-score` | 3 | Fires when a quality score is posted to the run |\n| `pre-commit` | 3 | Fires before the agent creates a git commit |\n| `pre-branch` | 3 | Fires before the agent creates a new git branch |\n| `post-planning` | 3 | Fires after the planning phase produces output |\n\n### Harness-Level Concepts (not SDK KnownHookType values)\n\nThese are integration points that your harness must implement. They are NOT SDK hook\ntypes -- they are harness-specific lifecycle events that drive the orchestration loop.\n\n| Harness Concept | Tier | Generic Equivalent |\n|---|---|---|\n| **session-start** | 1 | Session/conversation start callback. Call `session:init` to create the baseline state file. This maps to your harness's \"on conversation begin\" event. |\n| **stop** (exit interceptor) | 1 | Exit/turn-end interceptor. Run the decision algorithm (Section 2d) to BLOCK or APPROVE the agent's exit attempt. This is the core loop driver. |\n| **session-end** | 1 | Session cleanup. Delete the session state file when the conversation ends normally. |\n\n### Hook Discovery Directories\n\nIf implementing Tier 3, hook scripts are searched in this priority order:\n\n```\n1. Per-repo:   {REPO_ROOT}/.a5c/hooks/{hookType}/*.sh     (highest)\n2. Per-user:   ~/.config/babysitter/hooks/{hookType}/*.sh  (medium)\n3. Plugin:     {PLUGIN_ROOT}/hooks/{hookType}/*.sh         (lowest)\n```\n\nScripts within each directory are sorted alphabetically and executed sequentially.\n\n---\n\n## 6. CLI Output Schemas\n\nThis section documents the JSON output schemas for the most frequently used CLI\ncommands. All examples assume `--json` is passed.\n\n### `run:status` Output\n\n```json\n{\n  \"state\": \"waiting\",\n  \"lastEvent\": {\n    \"type\": \"EFFECT_REQUESTED\",\n    \"recordedAt\": \"2026-03-02T10:05:00Z\",\n    \"data\": { \"...\" : \"...\" }\n  },\n  \"pendingByKind\": {\n    \"node\": 2,\n    \"breakpoint\": 1\n  },\n  \"pendingEffectsSummary\": {\n    \"totalPending\": 3,\n    \"countsByKind\": { \"node\": 2, \"breakpoint\": 1 },\n    \"autoRunnableCount\": 2\n  },\n  \"needsMoreIterations\": true,\n  \"metadata\": null,\n  \"completionProof\": null\n}\n```\n\n| Field | Type | Description |\n|---|---|---|\n| `state` | `\"created\" \\| \"waiting\" \\| \"completed\" \\| \"failed\"` | Derived run lifecycle state |\n| `lastEvent` | object or null | The most recent journal event (serialized) |\n| `pendingByKind` | `Record<string, number>` | Count of pending effects grouped by kind |\n| `pendingEffectsSummary.totalPending` | number | Total pending effects |\n| `pendingEffectsSummary.autoRunnableCount` | number | Effects that can be auto-executed (kind=node) |\n| `needsMoreIterations` | boolean | True if state=waiting and autoRunnableCount > 0 |\n| `completionProof` | string or null | SHA-256 proof hash (only when state=completed) |\n\n### `session:check-iteration` Output\n\nThe output always includes `found`, `shouldContinue`, `iteration`, `maxIterations`,\n`runId`, and `prompt`. When `shouldContinue` is false, `reason` and `stopMessage`\nexplain why.\n\n```json\n// shouldContinue: true\n{ \"found\": true, \"shouldContinue\": true, \"nextIteration\": 4,\n  \"updatedIterationTimes\": [45, 62, 58], \"iteration\": 3,\n  \"maxIterations\": 65000, \"runId\": \"my-run-abc123\", \"prompt\": \"Build the API...\" }\n\n// shouldContinue: false -- possible reason values:\n//   \"max_iterations_reached\" (+ stopMessage)\n//   \"session_not_found\"      (found=false, all counters zero)\n```\n\n| `reason` value | Trigger condition | Extra fields |\n|---|---|---|\n| `max_iterations_reached` | iteration >= maxIterations | -- |\n| `session_not_found` | State file does not exist | `found: false` |\n\n### `task:list` Output\n\n```json\n{\n  \"tasks\": [\n    {\n      \"effectId\": \"E001-abc\",\n      \"taskId\": \"greet\",\n      \"stepId\": \"S000001\",\n      \"status\": \"pending\",\n      \"kind\": \"node\",\n      \"label\": \"Greet user\",\n      \"labels\": [\"greeting\"],\n      \"taskDefRef\": \"tasks/E001-abc/task.json\",\n      \"inputsRef\": null,\n      \"resultRef\": null,\n      \"stdoutRef\": null,\n      \"stderrRef\": null,\n      \"requestedAt\": \"2026-03-02T10:01:00Z\",\n      \"resolvedAt\": null\n    }\n  ]\n}\n```\n\n| Field | Type | Description |\n|---|---|---|\n| `effectId` | string | Unique effect identifier |\n| `taskId` | string | Task type identifier (from `defineTask`) |\n| `stepId` | string | Sequential step ID (e.g., `S000001`) |\n| `status` | `\"pending\" \\| \"resolved\" \\| \"unknown\"` | Current effect status |\n| `kind` | string | Task kind: `node`, `breakpoint`, `sleep`, `orchestrator_task`, or custom |\n| `label` | string or null | Human-readable label |\n| `taskDefRef` | string or null | Relative path to task.json |\n| `resultRef` | string or null | Relative path to result.json (null if pending) |\n\n### `session:iteration-message` Output\n\n**Command signature:**\n\n```bash\nbabysitter session:iteration-message \\\n  --iteration <n> \\\n  [--run-id <id>] \\\n  [--runs-dir <dir>] \\\n  [--plugin-root <dir>] \\\n  --json\n```\n\nNote: This command does NOT accept `--session-id` or `--state-dir`. It operates on\nrun data directly via `--run-id` and `--runs-dir`.\n\n```json\n{\n  \"systemMessage\": \"Babysitter iteration 3 | Waiting on: node. Check if pending effects are resolved, then call run:iterate.\",\n  \"runState\": \"waiting\",\n  \"completionProof\": null,\n  \"pendingKinds\": \"node\",\n  \"skillContext\": null,\n  \"iteration\": 3\n}\n```\n\n| Field | Type | Description |\n|---|---|---|\n| `systemMessage` | string | The formatted message to re-inject into the agent's context |\n| `runState` | `\"created\" \\| \"waiting\" \\| \"completed\" \\| \"failed\"` or null | Derived run state |\n| `completionProof` | string or null | Proof hash if run is completed |\n| `pendingKinds` | string or null | Comma-separated list of pending effect kinds |\n| `skillContext` | string or null | Discovered skill context (when `--plugin-root` is provided) |\n| `iteration` | number | The iteration number passed in |\n\n---\n\n## 7. CLI Error Handling\n\nAll CLI commands can fail. The harness must handle these failures gracefully\nrather than crashing or silently ignoring them. This section provides a unified\nerror handling strategy.\n\n### Error Categories\n\n| Category | Symptom | Recovery Strategy |\n|----------|---------|-------------------|\n| **Timeout** | CLI command exceeds expected duration | Kill the process, log the timeout, retry once with a longer timeout. If the retry also times out, APPROVE exit and log a diagnostic warning. |\n| **JSON parse error** | stdout is empty or contains non-JSON text (e.g., stack traces, warnings) | Check stderr for error details. Strip any non-JSON prefix from stdout (some environments prepend warnings). If still unparseable, treat as a command failure. |\n| **Lock conflict** | `run:iterate` or `task:post` fails because another process holds `run.lock` | Retry after 250ms, up to 40 retries (matching the SDK's internal retry behavior). If all retries fail, log the conflict and APPROVE exit. |\n| **Missing run directory** | `run:status` or `run:iterate` returns non-zero with ENOENT-style error | The run was deleted or never created. Mark the session inactive and approve/fail loudly according to harness policy. |\n| **Permission error** | EACCES or EPERM on file operations | Check file ownership and permissions. This usually indicates a misconfigured `BABYSITTER_RUNS_DIR`. |\n| **Non-zero exit, valid JSON** | CLI returns exit code != 0 but stdout contains valid JSON with an `error` field | Parse the JSON error object for structured diagnostics. The `error.code` field often contains a machine-readable error type. |\n\n### Unified Error Handler\n\n```\nfunction handleCLIError(commandName, shellResult):\n    // Step 1: Try to parse structured error from stdout\n    if shellResult.stdout != \"\":\n        try:\n            parsed = parseJson(shellResult.stdout)\n            if parsed.error:\n                log(\"CLI error in {commandName}: {parsed.error.message} (code: {parsed.error.code})\")\n                return { category: \"structured\", error: parsed.error }\n        catch parseError:\n            // stdout is not valid JSON -- fall through\n            pass\n\n    // Step 2: Check for known error patterns in stderr\n    stderr = shellResult.stderr or \"\"\n\n    if contains(stderr, \"ENOENT\") or contains(stderr, \"no such file\"):\n        return { category: \"missing_path\", message: stderr }\n\n    if contains(stderr, \"run.lock\") or contains(stderr, \"EBUSY\"):\n        return { category: \"lock_conflict\", message: stderr, retryable: true }\n\n    if contains(stderr, \"EACCES\") or contains(stderr, \"EPERM\"):\n        return { category: \"permission\", message: stderr }\n\n    if shellResult.timedOut:\n        return { category: \"timeout\", message: \"Command {commandName} timed out after {shellResult.timeoutMs}ms\" }\n\n    // Step 3: Generic failure\n    return {\n        category: \"unknown\",\n        exitCode: shellResult.exitCode,\n        message: stderr or \"Command {commandName} failed with exit code {shellResult.exitCode}\"\n    }\n```\n\n### Recommended Timeouts by Command\n\n| Command | Recommended Timeout | Notes |\n|---------|-------------------|-------|\n| `version --json` | 5s | Should be near-instant |\n| `session:init` | 5s | File creation only |\n| `session:associate` | 5s | File update only |\n| `run:create` | 10s | Creates directory structure and journal |\n| `run:assign-process` | 10s | Updates run.json and appends journal event under lock |\n| `run:iterate` | 120s | May execute process function; uses `BABYSITTER_TIMEOUT` env var |\n| `run:status` | 10s | Reads journal and derives state |\n| `task:list` | 10s | Reads task directories |\n| `task:post` | 15s | Writes result + appends journal event |\n| `session:check-iteration` | 5s | Reads and parses state file |\n| `session:iteration-message` | 10s | Reads run state, discovers skills |\n| `run:repair-journal` | 30s | Scans and repairs journal files |\n\n---\n\n## 8. Edge Cases\n\n> Note: For CLI command failures (timeouts, parse errors, lock conflicts), see\n> [Section 7: CLI Error Handling](#7-cli-error-handling).\n\n### Stale Session State File\n\nIf the harness crashes or the agent is forcefully terminated, a session state file\nmay be left behind. On the next session start, `session:init` will fail with\n`SESSION_EXISTS`. Handling:\n\n```\nfunction handleStaleSession(sessionId, stateDir):\n    stateFile = \"{stateDir}/{sessionId}.md\"\n    existing = parseSessionState(stateFile)\n\n    // If the run is completed or failed, clean up and re-init\n    if existing.runId != \"\":\n        statusResult = shell(\"babysitter run:status .a5c/runs/{existing.runId} --json\")\n        if statusResult.exitCode == 0:\n            runStatus = parseJson(statusResult.stdout)\n            if runStatus.state in [\"completed\", \"failed\"]:\n                deleteFile(stateFile)\n                return shell(\"babysitter session:init --session-id {sessionId} --state-dir {stateDir} --json\")\n\n    // Otherwise, offer to resume\n    return { action: \"resume_or_cleanup\", existingRunId: existing.runId }\n```\n\n### Run Directory Missing or Corrupted\n\nIf the run directory is deleted or journal files are corrupted, `run:status` and\n`run:iterate` will return non-zero exit codes. The stop interceptor should APPROVE\nexit in this case (Guard 5 in the decision algorithm).\n\n### Concurrent Sessions on Same Run\n\nThe SDK uses file-based run locking (`run.lock` with PID). If two sessions try to\niterate the same run concurrently, one will fail to acquire the lock. The harness\nshould retry after a short delay (250ms, up to 40 retries) or report the conflict.\n\n### Effect Posted but Journal Not Updated\n\nIf the harness crashes between writing `result.json` and the CLI appending the\n`EFFECT_RESOLVED` journal event, the run may appear stuck. Use `run:repair-journal`\nto detect and fix such inconsistencies:\n\n```bash\nbabysitter run:repair-journal .a5c/runs/{runId} --json\n```\n\n### Zero Pending Tasks After Iterate\n\nIf `run:iterate` returns status=waiting but `task:list --pending` returns zero tasks,\nthis indicates all effects were resolved during the iterate call itself (e.g., via\nreplay). Simply call `run:iterate` again on the next iteration.\n\n---\n\n## 9. Testing the Integration\n\n### Smoke Test Checklist\n\nRun these tests in order. Each builds on the previous.\n\n#### Test 1: CLI Availability\n\n```bash\nbabysitter version --json\n```\n\n- [ ] Exit code is 0\n- [ ] Output contains `\"version\"` field\n\n#### Test 2: Session Initialization\n\n```bash\nbabysitter session:init \\\n  --session-id test-session-001 \\\n  --state-dir /tmp/babysitter-test/state \\\n  --json\n```\n\n- [ ] Exit code is 0\n- [ ] State file created at `/tmp/babysitter-test/state/test-session-001.md`\n- [ ] File contains YAML frontmatter with `active: true`, `iteration: 1`, `run_id: \"\"`\n\n#### Test 3: Run Creation\n\n```bash\n# Create a minimal process file\ncat > /tmp/babysitter-test/process.js << 'EOF'\nexports.process = async function(inputs, ctx) {\n  const result = await ctx.task('greet', { name: inputs.name });\n  return { greeting: result };\n};\nEOF\n\n# Create inputs file\necho '{\"name\": \"World\"}' > /tmp/babysitter-test/inputs.json\n\n# Create the run\nbabysitter run:create \\\n  --process-id test-process \\\n  --entry /tmp/babysitter-test/process.js#process \\\n  --inputs /tmp/babysitter-test/inputs.json \\\n  --prompt \"Test run\" \\\n  --json\n```\n\n- [ ] Exit code is 0\n- [ ] Output contains `\"runId\"` field\n- [ ] Directory `.a5c/runs/{runId}/` exists with `run.json` and `journal/`\n\n#### Test 4: Session Binding\n\n```bash\nbabysitter session:associate \\\n  --session-id test-session-001 \\\n  --run-id {runId} \\\n  --state-dir /tmp/babysitter-test/state \\\n  --json\n```\n\n- [ ] Exit code is 0\n- [ ] State file now has `run_id: \"{runId}\"`\n\n#### Test 5: Run Iterate\n\n```bash\nbabysitter run:iterate .a5c/runs/{runId} --json\n```\n\n- [ ] Exit code is 0\n- [ ] Output contains `\"status\"` field\n\n#### Test 6: Task List\n\n```bash\nbabysitter task:list .a5c/runs/{runId} --pending --json\n```\n\n- [ ] Exit code is 0\n- [ ] Output contains `\"tasks\"` array\n\n#### Test 7: Iteration Guard\n\n```bash\nbabysitter session:check-iteration \\\n  --session-id test-session-001 \\\n  --state-dir /tmp/babysitter-test/state \\\n  --json\n```\n\n- [ ] Exit code is 0\n- [ ] Output contains `\"shouldContinue\": true`\n\n#### Test 8: Iteration Message\n\n```bash\nbabysitter session:iteration-message \\\n  --iteration 2 \\\n  --run-id {runId} \\\n  --runs-dir .a5c/runs \\\n  --json\n```\n\n- [ ] Exit code is 0\n- [ ] Output contains `\"systemMessage\"` field\n- [ ] Output contains `\"iteration\": 2`\n\n### Common Failure Modes\n\n| Symptom | Likely Cause | Fix |\n|---------|-------------|-----|\n| `babysitter: command not found` | SDK not installed or not on PATH | Re-run installation (Section 2a) |\n| Stop hook always APPROVEs | No session state file, or `run_id` is empty | Check session:init and session:associate ran |\n| Infinite loop (never exits) | Completion proof not detected | Check transcript scanning for `<promise>` tags |\n| Exits after 1 iteration | Stop interceptor not wired correctly | Verify BLOCK decision re-injects context |\n| `Session already associated` | Re-entrant run on same session | Clean up old state file or complete old run |\n| Iterations very fast, exits early | Runaway detection triggers (avg <= 15s) | Agent is not doing meaningful work per iteration; check effect execution |\n| State file corrupt | Non-atomic write or concurrent access | Use atomic write protocol (temp + rename) |\n| `task:post` fails | Writing result.json directly instead of via CLI | Always use `babysitter task:post` command |\n| Run stuck in \"waiting\" | Effects executed but results not posted | Check task:post calls after each effect |\n\n### End-to-End Integration Test\n\nThe following pseudocode validates the complete loop:\n\n```\nfunction testEndToEnd():\n    sessionId = \"e2e-test-\" + randomId()\n    pluginRoot = \"/tmp/babysitter-e2e\"\n    stateDir = \"{pluginRoot}/skills/babysit/state\"\n    runsDir = \".a5c/runs\"\n\n    // 1. Init\n    ensureBabysitterCLI()\n    shell(\"babysitter session:init --session-id {sessionId} --state-dir {stateDir} --json\")\n\n    // 2. Create and bind\n    result = shell(\"babysitter run:create --process-id test --entry ./process.js#process --inputs inputs.json --json\")\n    runId = parseJson(result.stdout).runId\n    shell(\"babysitter session:associate --session-id {sessionId} --run-id {runId} --state-dir {stateDir} --json\")\n\n    // 3. Iterate\n    iterResult = shell(\"babysitter run:iterate .a5c/runs/{runId} --json\")\n    assert parseJson(iterResult.stdout).status in [\"executed\", \"waiting\", \"completed\"]\n\n    // 4. Execute effects\n    listResult = shell(\"babysitter task:list .a5c/runs/{runId} --pending --json\")\n    tasks = parseJson(listResult.stdout).tasks\n    for task in tasks:\n        // Execute task, write output\n        shell(\"babysitter task:post .a5c/runs/{runId} {task.effectId} --status ok --value output.json --json\")\n\n    // 5. Check iteration guard\n    guardResult = shell(\"babysitter session:check-iteration --session-id {sessionId} --state-dir {stateDir} --json\")\n    assert parseJson(guardResult.stdout).shouldContinue == true\n\n    // 6. Get iteration message (correct params: --iteration, --run-id, --runs-dir)\n    msgResult = shell(\"babysitter session:iteration-message --iteration 2 --run-id {runId} --runs-dir {runsDir} --json\")\n    assert parseJson(msgResult.stdout).systemMessage is not null\n\n    // 7. Re-iterate until completed\n    iterResult = shell(\"babysitter run:iterate .a5c/runs/{runId} --json\")\n    if parseJson(iterResult.stdout).status == \"completed\":\n        proof = parseJson(iterResult.stdout).completionProof\n        assert proof is not null and proof is not \"\"\n\n    print(\"END-TO-END TEST PASSED\")\n```\n\n---\n\n## 10. Reference Implementation\n\nThe canonical reference implementation is the Claude Code harness adapter, documented at:\n\n**[docs/assimilation/harness/claude-code-integration.md](./claude-code-integration.md)**\n\nKey files in the reference implementation:\n\n| File | Role |\n|------|------|\n| `packages/sdk/src/harness/types.ts` | `HarnessAdapter` interface definition |\n| `packages/sdk/src/harness/claudeCode.ts` | Claude Code adapter (stop hook, session-start, binding) |\n| `packages/sdk/src/harness/nullAdapter.ts` | No-op fallback adapter (useful as a starting template) |\n| `packages/sdk/src/harness/registry.ts` | Adapter auto-detection and lookup |\n| `packages/sdk/src/session/` | Session state parsing, writing, and types |\n| `artifacts/generated-plugins/claude-code/hooks/babysitter-proxied-stop.sh` | Generated Claude Code stop hook entry |\n| `artifacts/generated-plugins/claude-code/hooks/babysitter-proxied-session-start.sh` | Generated Claude Code session-start hook entry |\n\n### Writing a New Harness Adapter\n\nTo add first-class SDK support for your harness, implement the `HarnessAdapter`\ninterface:\n\n```typescript\ninterface HarnessAdapter {\n  readonly name: string;\n  isActive(): boolean;\n  resolveSessionId(parsed: { sessionId?: string }): string | undefined;\n  resolveStateDir(args: { stateDir?: string; pluginRoot?: string }): string | undefined;\n  resolvePluginRoot(args: { pluginRoot?: string }): string | undefined;\n  bindSession(opts: SessionBindOptions): Promise<SessionBindResult>;\n  handleStopHook(args: HookHandlerArgs): Promise<number>;\n  handleSessionStartHook(args: HookHandlerArgs): Promise<number>;\n  findHookDispatcherPath(startCwd: string): string | null;\n}\n```\n\nRegister your adapter in `packages/sdk/src/harness/registry.ts` and it will be\nauto-detected when its `isActive()` method returns `true`.\n\nFor harnesses that cannot modify the SDK source, the entire integration can be\nbuilt externally by calling the babysitter CLI commands documented in this guide.\n\n### Full-Code Example: Minimal Node.js Harness\n\nThe following is a complete, runnable Node.js implementation (not pseudocode) of a\nminimal harness that drives a single babysitter run to completion. It covers session\ninitialization, run creation, the orchestration loop, effect execution for node tasks,\nand completion proof extraction.\n\n```javascript\n#!/usr/bin/env node\n// minimal-harness.js -- A complete minimal babysitter harness implementation.\n// Usage: node minimal-harness.js <process-file>#<export> <inputs.json> [prompt]\n\nconst { execSync } = require('child_process');\nconst { readFileSync, writeFileSync, mkdirSync, existsSync } = require('fs');\nconst { join } = require('path');\nconst crypto = require('crypto');\n\n// --- Configuration ---\nconst RUNS_DIR = '.a5c/runs';\nconst STATE_DIR = join(process.cwd(), '.harness-state');\nconst MAX_ITERATIONS = 65000;\nconst RUNAWAY_THRESHOLD_ITERATIONS = 5;\nconst RUNAWAY_THRESHOLD_SECONDS = 15;\nconst CLI_TIMEOUT_MS = 120_000;\n\n// --- Helpers ---\nfunction cli(command, timeoutMs = CLI_TIMEOUT_MS) {\n  try {\n    const stdout = execSync(`babysitter ${command}`, {\n      encoding: 'utf8',\n      timeout: timeoutMs,\n      stdio: ['pipe', 'pipe', 'pipe'],\n    });\n    return { exitCode: 0, stdout, stderr: '' };\n  } catch (err) {\n    return {\n      exitCode: err.status ?? 1,\n      stdout: err.stdout ?? '',\n      stderr: err.stderr ?? '',\n      timedOut: err.killed === true,\n    };\n  }\n}\n\nfunction cliJson(command, timeoutMs) {\n  const result = cli(`${command} --json`, timeoutMs);\n  if (result.exitCode !== 0) {\n    console.error(`CLI error (${command}): ${result.stderr}`);\n    return null;\n  }\n  try {\n    return JSON.parse(result.stdout);\n  } catch {\n    console.error(`JSON parse error for ${command}: ${result.stdout.slice(0, 200)}`);\n    return null;\n  }\n}\n\n// --- Main ---\nasync function main() {\n  const [,, entryPoint, inputsFile, prompt = 'Run process'] = process.argv;\n  if (!entryPoint || !inputsFile) {\n    console.error('Usage: node minimal-harness.js <entry>#<export> <inputs.json> [prompt]');\n    process.exit(1);\n  }\n\n  const sessionId = `harness-${crypto.randomUUID().slice(0, 8)}`;\n  mkdirSync(STATE_DIR, { recursive: true });\n\n  // Step 1: Verify CLI\n  const version = cliJson('version');\n  if (!version) { console.error('babysitter CLI not available'); process.exit(1); }\n  console.log(`Using babysitter SDK v${version.sdkVersion || version.version}`);\n\n  // Step 2: Session init\n  const initResult = cliJson(\n    `session:init --session-id ${sessionId} --state-dir ${STATE_DIR}`\n  );\n  if (!initResult) { console.error('Session init failed'); process.exit(1); }\n\n  // Step 3: Create run\n  const processId = entryPoint.split('#')[0].replace(/[^a-zA-Z0-9-_]/g, '-');\n  const createResult = cliJson(\n    `run:create --process-id ${processId} --entry ${entryPoint}` +\n    ` --inputs ${inputsFile} --prompt \"${prompt.replace(/\"/g, '\\\\\"')}\"`\n  );\n  if (!createResult) { console.error('Run creation failed'); process.exit(1); }\n  const { runId } = createResult;\n  const runDir = join(RUNS_DIR, runId);\n  console.log(`Created run: ${runId}`);\n\n  // Step 4: Bind session\n  cliJson(\n    `session:associate --session-id ${sessionId} --run-id ${runId} --state-dir ${STATE_DIR}`\n  );\n\n  // Step 5: Orchestration loop\n  const iterationTimes = [];\n  let iteration = 0;\n\n  while (iteration < MAX_ITERATIONS) {\n    iteration++;\n    const iterStart = Date.now();\n    console.log(`\\n--- Iteration ${iteration} ---`);\n\n    // 5a: Iterate\n    const iterData = cliJson(`run:iterate ${runDir}`, CLI_TIMEOUT_MS);\n    if (!iterData) { console.error('run:iterate failed'); break; }\n\n    if (iterData.status === 'completed') {\n      console.log(`Run completed. Proof: ${iterData.completionProof}`);\n      break;\n    }\n    if (iterData.status === 'failed') {\n      console.error('Run failed:', JSON.stringify(iterData, null, 2));\n      break;\n    }\n\n    // 5b: List and execute pending tasks\n    const listData = cliJson(`task:list ${runDir} --pending`);\n    if (!listData || !listData.tasks || listData.tasks.length === 0) {\n      console.log('No pending tasks; re-iterating...');\n      continue;\n    }\n\n    for (const task of listData.tasks) {\n      const taskDir = join(runDir, 'tasks', task.effectId);\n      const taskDefPath = join(taskDir, 'task.json');\n      if (!existsSync(taskDefPath)) {\n        console.error(`task.json missing for ${task.effectId}`);\n        continue;\n      }\n      const taskDef = JSON.parse(readFileSync(taskDefPath, 'utf8'));\n      let result;\n\n      switch (task.kind) {\n        case 'node': {\n          // Execute the node task's script\n          try {\n            const mod = require(taskDef.args.scriptPath);\n            const fn = taskDef.args.exportName ? mod[taskDef.args.exportName] : mod.default || mod;\n            const output = await fn(taskDef.args.input);\n            result = { status: 'ok', value: output };\n          } catch (err) {\n            result = { status: 'error', value: { message: err.message, stack: err.stack } };\n          }\n          break;\n        }\n        case 'breakpoint':\n          // Minimal harness: auto-reject breakpoints (non-interactive)\n          result = { status: 'ok', value: { approved: false, reason: 'Non-interactive harness' } };\n          break;\n        case 'sleep': {\n          const until = taskDef.args.until || taskDef.args.sleepUntil;\n          const durationMs = taskDef.args.durationMs;\n          const target = until ? new Date(until).getTime() : (Date.now() + (durationMs || 0));\n          const remaining = target - Date.now();\n          if (remaining > 0 && remaining <= 60000) {\n            await new Promise(r => setTimeout(r, remaining));\n          }\n          result = { status: 'ok', value: { wokeAt: new Date().toISOString(), reason: 'waited' } };\n          break;\n        }\n        default:\n          result = { status: 'error', value: { message: `Unsupported task kind: ${task.kind}` } };\n      }\n\n      // Post result\n      const outputPath = join(taskDir, 'output.json');\n      writeFileSync(outputPath, JSON.stringify(result.value));\n      cli(`task:post ${runDir} ${task.effectId} --status ${result.status} --value ${outputPath} --json`);\n    }\n\n    // 5c: Runaway detection\n    const iterDuration = (Date.now() - iterStart) / 1000;\n    iterationTimes.push(iterDuration);\n    if (iteration >= RUNAWAY_THRESHOLD_ITERATIONS) {\n      const recent = iterationTimes.slice(-3);\n      const avg = recent.reduce((a, b) => a + b, 0) / recent.length;\n      if (avg <= RUNAWAY_THRESHOLD_SECONDS) {\n        console.error(`Runaway detected: avg ${avg.toFixed(1)}s <= ${RUNAWAY_THRESHOLD_SECONDS}s threshold`);\n        break;\n      }\n    }\n  }\n\n  console.log(`\\nHarness finished after ${iteration} iterations.`);\n}\n\nmain().catch(err => { console.error(err); process.exit(1); });\n```\n\n---\n\n## Appendix: Complete CLI Command Reference\n\n| Command | Purpose | Section |\n|---------|---------|---------|\n| `babysitter version --json` | Verify CLI installation | 2a |\n| `babysitter session:init --session-id ID --state-dir DIR --json` | Create baseline session state | 2b |\n| `babysitter run:create --process-id PID --entry FILE --inputs FILE --json` | Create a new run | 2c |\n| `babysitter run:assign-process RUNDIR --entry FILE [--process-id PID] --json` | Assign process to bare run | 2c |\n| `babysitter session:associate --session-id ID --run-id RID --state-dir DIR --json` | Bind session to run | 2c |\n| `babysitter run:iterate RUNDIR --json` | Advance orchestration, discover effects | 2d, 2e |\n| `babysitter run:status RUNDIR --json` | Read run status and completion proof | 2d |\n| `babysitter session:iteration-message --iteration N [--run-id ID] [--runs-dir DIR] [--plugin-root DIR] --json` | Get context to re-inject after BLOCK | 2d |\n| `babysitter task:list RUNDIR --pending --json` | List pending effects | 2e |\n| `babysitter task:show RUNDIR EFFECTID --json` | Read task definition | 2e |\n| `babysitter task:post RUNDIR EFFECTID --status STATUS --value FILE --json` | Post effect result | 2f |\n| `babysitter session:check-iteration --session-id ID --state-dir DIR --json` | Check iteration guards | 2g |\n| `babysitter run:repair-journal RUNDIR --json` | Repair inconsistent journal | 7 |\n| `babysitter hook:run --hook-type TYPE --harness NAME --json` | Dispatch a lifecycle hook | 5 |\n",
    "documents": []
  },
  "outgoingEdges": [],
  "incomingEdges": [
    {
      "from": "page:docs-assimilation-harness",
      "to": "page:docs-assimilation-harness-generic-harness-guide",
      "kind": "contains_page"
    }
  ]
}