II.
Page JSON
Structured · livepage:docs-reference-babysitter-cli-surface-spec
Babysitter Cli Surface Spec json
Inspect the normalized record payload exactly as the atlas UI reads it.
{
"id": "page:docs-reference-babysitter-cli-surface-spec",
"_kind": "Page",
"_file": "wiki/docs/reference/babysitter-cli-surface-spec.md",
"_cluster": "wiki",
"attributes": {
"nodeKind": "Page",
"sourcePath": "docs/reference/babysitter_cli_surface_spec.md",
"sourceKind": "repo-docs",
"title": "Babysitter Cli Surface Spec",
"displayName": "Babysitter Cli Surface Spec",
"slug": "docs/reference/babysitter-cli-surface-spec",
"articlePath": "wiki/docs/reference/babysitter_cli_surface_spec.md",
"article": "\nBabysitter CLI Surface Spec (cli_tool)\n======================================\n\nScope & Intent\n--------------\n- Define the external CLI (`babysitter`) that ships with `@a5c-ai/babysitter-sdk` and gives humans or automation a thin, deterministic shell around run folders produced by the SDK.\n- Cover the commands already sketched in `sdk.md §12` and implemented in `packages/sdk/src/cli/*`: run lifecycle inspection, deterministic orchestration loops, task introspection/execution, and state-repair utilities.\n- Keep the interface consistent across macOS, Linux, and Windows shells, honoring `cli_tool` domain guardrails (stable flags/defaults, explicit config precedence, and no sensitive payloads echoed to stdout).\n\nBehavior\n--------\n1. **Global invocation**\n - Binary name `babysitter`. Subcommands follow `babysitter <area>:<verb>` (e.g., `run:continue`).\n - Supported top-level flags on every command: `--runs-dir <path>` (advanced override; default root is `~/.a5c/runs`, or `<repo>/.a5c/runs` when `BABYSITTER_RUNS_SCOPE=repo`), `--json`, `--dry-run` (commands that mutate state must honor it), and `--verbose` (when set, log filesystem paths and resolved options to stderr).\n - Exit codes: `0` for success, `1` for expected user errors (bad args, missing run), `>1` for unexpected crashes. `--json` never changes exit semantics.\n - All paths returned to the user are normalized to POSIX separators relative to `<runDir>` even on Windows; CLI accepts either slash style as input.\n\n2. **Run lifecycle management**\n - `run:create` writes `run.json`, optional `inputs.json`, and appends `RUN_CREATED` via the runtime API. Required flags: `--process-id`, `--entry`. Optional `--inputs`, `--run-id`, `--process-revision`, `--request`. When `--entry` is omitted, creates a bare run (`entrypoint.importPath = \"bare-run\"`) that must be assigned a process via `run:assign-process` before iteration.\n - `run:assign-process` attaches a process to an existing bare run. Required: `<runDir>` positional, `--entry`. Optional: `--process-id`, `--process-revision`, `--force`, `--dry-run`. Updates `run.json` under the run lock and appends `PROCESS_ASSIGNED` journal event. Rejects if the run already has a process unless `--force`.\n - `run:status` prints `[run:status] state=<created|waiting|completed|failed> last=<TYPE#SEQ ISO> pending[...]` plus one line per pending kind; JSON mirrors `{ state, lastEvent, pendingByKind }`. Works even if journal/state files are missing by treating them as empty.\n - `run:events` streams journal entries with `--limit`, `--reverse`, `--filter-type`, and `--json`. Missing run directory or unreadable event files emit a single error line and exit `1`.\n - `run:rebuild-state` (surface for `rebuildStateCache`) locks the run, replays the journal, writes `state/state.json`, and prints/returns the rebuild reason, event counts, and resulting `stateVersion`.\n\n3. **Orchestration control loops**\n - `run:continue` was removed; callers should loop `run:iterate`, execute pending effects externally, and commit via `task:post`.\n\n4. **Task introspection and execution**\n - `task:list` reads the effect index and prints `- <effectId> [<kind> <status>] <label?> (taskId=<taskId>)`. Flags: `--pending`, `--kind`. JSON payload is `{ tasks: TaskListEntry[] }` where every entry includes refs for task/result/stdout/stderr with POSIX paths.\n - `task:show` pretty-prints `task.json` and `result.json` (or `(not yet written)` if pending) and mirrors the list entry in JSON mode.\n - `task:post` commits externally produced results for any effect kind. It validates the effect is still `requested`, writes `tasks/<effectId>/result.json`, and appends `EFFECT_RESOLVED` via `commitEffectResult`. `--dry-run` previews the mutation without committing. JSON response includes `{ status, committed, stdoutRef, stderrRef, resultRef }`.\n - Manual breakpoint resolution stays manual: `task:list` highlights `kind=\"breakpoint\"`. Dedicated `breakpoint:resolve`/`sleep:list` commands are tracked separately and are not required to ship with this part.\n\n5. **Output and UX conventions**\n - Human text is intentionally terse (single-line headers with prefixed command ids) for easy parsing in CI logs.\n - `--json` outputs single JSON documents (no streams) so scripts can `jq` them. All timestamps are ISO8601 strings, numbers stay numeric.\n - Errors include the command prefix, the resolved `<runDir>`, and the underlying message (`[run:events] unable to read run metadata at ...`). `--verbose` adds stack traces.\n - Secrets from task definitions are never echoed: CLI logs file refs instead of dumping blobs/result payloads unless `--verbose` is paired with `--json` and `BABYSITTER_ALLOW_SECRET_LOGS=true`.\n\nAcceptance Criteria\n-------------------\n1. **Flag & path consistency** – Every command resolves runs through the central default path policy, honors `--runs-dir` when explicitly provided, validates required positional args, and prints actionable errors with non-zero exit codes when resolution fails. Tests cover Windows-style and POSIX-style inputs.\n2. **Deterministic JSON contracts** – `run:create`, `run:assign-process`, `run:status`, `run:events`, `run:iterate`, `task:list`, `task:show`, and `task:post` emit the schemas described above; snapshot tests guard against accidental drift.\n3. **Safe automation loops** – orchestration loops are owned by the caller (skill/hook/worker). The CLI provides deterministic primitives (`run:iterate`, `task:list`, `task:post`) and never embeds task-execution policy.\n4. **State repair tooling** – `run:rebuild-state` rebuilds derived state when `state/state.json` is missing or stale and reports the rebuild result in both human and JSON modes. Subsequent `run:status` reflects the rebuilt `stateVersion`.\n5. **Process integration** – CLI surfaces are thin wrappers over runtime APIs (`createRun`, `orchestrateIteration`, `commitEffectResult`, `rebuildStateCache`). Unit tests stub these APIs to ensure argument translation and error propagation are correct.\n6. **Documentation & help** – `babysitter --help` (or bare invocation, or wrong-syntax error) prints the **agent-facing** usage block (commands intended for skill/hook automation). `babysitter --help-human` prints the **human-facing** usage block (commands intended for direct interactive use, e.g. `harness:*`, `session:init`, `mcp:serve`, `compress-output`). README/sdk.md tables stay in sync with both surfaces.\n\nEdge Cases\n----------\n- Missing or deleted run directories: commands fail fast with `[command] unable to read run metadata` and exit `1`.\n- Empty journals: `run:status` reports `created` with `last=none` and `pending[total]=0`; `run:events --json` returns an empty array.\n- Task output blobs larger than 1 MiB: `task:list` and `task:show` print refs to blob files rather than dumping whole payloads; `task:post --json` points to `stdoutRef`, `stderrRef`, and `resultRef`.\n- Windows drive letters and UNC paths: `--runs-dir` and `<runDir>` may include drive prefixes; CLI resolves them but continues to emit POSIX-style refs in JSON/logs.\n- Legacy compatibility: when the active runs root is global, commands that read existing runs should also probe `<repo>/.a5c/runs` before reporting a missing run.\n\nNon-Goals\n---------\n- Implementing interactive TUIs, dashboards, or VS Code surfaces (handled elsewhere in Babysitter).\n- Remote/distributed task execution backends; CLI focuses on run iteration + result commits, not execution.\n- New intrinsic kinds or scheduler policies; CLI simply reflects what the runtime reports.\n- Packaging/distribution mechanics (npm publish, Homebrew formulas) and telemetry collection—tracked in separate operational docs.\n- Auto-resolving breakpoints, orchestrator tasks, or sleep gates in this part; those require explicit manual commands or future automation.\n",
"documents": []
},
"outgoingEdges": [],
"incomingEdges": [
{
"from": "page:docs-reference",
"to": "page:docs-reference-babysitter-cli-surface-spec",
"kind": "contains_page"
}
]
}