II.
Page JSON
Structured · livepage:docs-agent-mux-reference-11-process-lifecycle-and-platform
Process Lifecycle, Safety, and Cross-Platform Support json
Inspect the normalized record payload exactly as the atlas UI reads it.
{
"id": "page:docs-agent-mux-reference-11-process-lifecycle-and-platform",
"_kind": "Page",
"_file": "wiki/docs/agent-mux/reference/11-process-lifecycle-and-platform.md",
"_cluster": "wiki",
"attributes": {
"nodeKind": "Page",
"sourcePath": "docs/agent-mux/reference/11-process-lifecycle-and-platform.md",
"sourceKind": "repo-docs",
"title": "Process Lifecycle, Safety, and Cross-Platform Support",
"displayName": "Process Lifecycle, Safety, and Cross-Platform Support",
"slug": "docs/agent-mux/reference/11-process-lifecycle-and-platform",
"articlePath": "wiki/docs/agent-mux/reference/11-process-lifecycle-and-platform.md",
"article": "\n# Process Lifecycle, Safety, and Cross-Platform Support\n\n**Specification v1.0** | `@a5c-ai/agent-mux`\n\n> **SCOPE EXTENSION:** hermes-agent (`@NousResearch/hermes-agent`) is included as a 10th supported agent per explicit project requirements from the project owner. It extends the original scope document's 9 built-in agents. All hermes-specific content in this spec is marked with this same scope extension note.\n\n---\n\n## 1. Overview\n\nThis specification is the authoritative reference for subprocess management, process safety guarantees, and cross-platform support in `@a5c-ai/agent-mux`. It consolidates and deepens the process-lifecycle material introduced in `03-run-handle-and-interaction.md` (sections 6–12), adds the full per-agent cross-platform compatibility matrix from scope §23, and specifies platform-specific path resolution, shell invocation, PTY backend selection, and resource cleanup in detail.\n\nAll ten built-in agents (claude, codex, gemini, copilot, cursor, opencode, pi, omp, openclaw, hermes) share the same process lifecycle contract. Differences in platform support, PTY requirements, and shell invocation are documented per-agent in the tables below.\n\n### 1.1 Cross-References\n\n| Type / Concept | Spec | Section |\n|---|---|---|\n| `RunHandle`, subprocess management | `03-run-handle-and-interaction.md` | 6 |\n| `ProcessTracker`, zombie prevention | `03-run-handle-and-interaction.md` | 6.4 |\n| `PlatformAdapter` interface (base) | `03-run-handle-and-interaction.md` | 8.3 |\n| PTY support, `node-pty` dependency | `03-run-handle-and-interaction.md` | 7 |\n| Run isolation, temp directories | `03-run-handle-and-interaction.md` | 9 |\n| Backpressure and buffer management | `03-run-handle-and-interaction.md` | 10 |\n| Concurrency safety | `03-run-handle-and-interaction.md` | 11 |\n| `RunOptions.gracePeriodMs` | `03-run-handle-and-interaction.md` | 6.2 (within signal handling prose) |\n| `SpawnArgs` type | `05-adapter-system.md` | 3.1 |\n| `AgentAdapter.buildSpawnArgs()` | `05-adapter-system.md` | 2 |\n| `AgentCapabilities.supportedPlatforms` | `06-capabilities-and-models.md` | 2 |\n| `AgentCapabilities.requiresPty` | `06-capabilities-and-models.md` | 2 |\n| `ConfigManager` file locking | `08-config-and-auth.md` | 13 |\n| Native config file locations | `08-config-and-auth.md` | 7 |\n| `ErrorCode` union | `01-core-types-and-client.md` | 3.1 |\n| `AgentMuxError` | `01-core-types-and-client.md` | 3.1 |\n| CLI signal handling | `10-cli-reference.md` | 20 |\n| `RunOptions` | `02-run-options-and-profiles.md` | 2 |\n\n---\n\n## 2. Subprocess Spawn Sequence\n\nWhen `mux.run()` is called, the stream engine executes the following spawn sequence. Each step is numbered for reference in error-handling sections. This sequence is a simplified summary; the authoritative step-by-step is in `03-run-handle-and-interaction.md` §6.1. The ordering below groups steps by concern for readability — the critical constraint is that Step 5 (ProcessTracker registration) must happen synchronously after spawn and before any `await`.\n\n```\nStep 1 Validate RunOptions against agent capabilities\n → CapabilityError on unsupported options\n\nStep 2 Create per-run temp directory\n → os.tmpdir()/agent-mux-<runId>/\n → Mode 0o700 (owner read/write/execute only)\n\nStep 3 Call adapter.buildSpawnArgs(resolvedOptions)\n → Produces SpawnArgs { command, args, env, cwd, shell, usePty }\n\nStep 4 Determine spawn mode (pipe vs. PTY)\n → If usePty && !nodePtyAvailable → throw PTY_NOT_AVAILABLE\n → If usePty → pty.spawn()\n → Else → child_process.spawn()\n\nStep 5 Register subprocess with ProcessTracker\n → Must happen synchronously after spawn, before any await\n\nStep 6 Wire stdio pipes / PTY streams to line parser\n → Line parser feeds adapter.parseEvent()\n → Parsed events enter the event buffer\n\nStep 7 Start timeout / inactivity timers\n → Per RunOptions.timeout and RunOptions.inactivityTimeout\n\nStep 8 Emit 'session_start' or 'session_resume' event\n → Run is now in 'running' state\n```\n\n### 2.1 Spawn Options by Mode\n\n#### Pipe Mode (default)\n\n```typescript\nimport { spawn } from 'child_process';\n\nconst child = spawn(spawnArgs.command, spawnArgs.args, {\n cwd: spawnArgs.cwd,\n env: { ...process.env, ...spawnArgs.env },\n stdio: ['pipe', 'pipe', 'pipe'],\n detached: process.platform !== 'win32', // Unix: new process group\n shell: spawnArgs.shell,\n windowsHide: true,\n});\n```\n\n**Unix:** `detached: true` creates a new process group. The process group ID equals the child PID. Signals sent to `-pid` reach the entire group.\n\n**Windows:** `detached: false` (the child shares the parent's console). The child is assigned to a Job Object for lifecycle management (see Section 3.3).\n\n#### PTY Mode\n\n```typescript\nimport * as pty from 'node-pty';\n\nconst child = pty.spawn(spawnArgs.command, spawnArgs.args, {\n name: 'xterm-256color',\n cols: 120,\n rows: 40,\n cwd: spawnArgs.cwd,\n env: { ...process.env, ...spawnArgs.env },\n});\n```\n\nPTY mode is used only when `spawnArgs.usePty` is `true` (see Section 6 for which agents require it).\n\n---\n\n## 3. Process Tracking and Zombie Prevention\n\n### 3.1 ProcessTracker Singleton\n\nThe `ProcessTracker` is a module-level singleton that maintains the set of all active subprocesses across all `RunHandle` instances. Its interface is defined in `03-run-handle-and-interaction.md` §6.4; this section specifies platform-specific implementation details.\n\n```typescript\ninterface ProcessTracker {\n /**\n * Register a spawned process for tracking.\n *\n * @param pid - Process ID of the spawned child.\n * @param groupId - Process group ID (Unix) or Job Object handle ID (Windows).\n * @param runId - The run ID that owns this process.\n * @param gracePeriodMs - Grace period for this process's two-phase shutdown.\n * Stored per-registration so killAll() uses the correct grace period for\n * each tracked process. Defaults to 5000ms if not provided.\n */\n register(pid: number, groupId: number, runId: string, gracePeriodMs?: number): void;\n\n unregister(pid: number): void;\n\n /**\n * Kill all tracked processes using the two-phase shutdown sequence.\n * Each process uses the gracePeriodMs stored at registration time.\n * See behavioral contract below.\n */\n killAll(): void;\n\n readonly activeCount: number;\n}\n```\n\n> **Note on interface divergence:** The `ProcessTracker` interface in `03-run-handle-and-interaction.md` §6.4 defines `register(pid, groupId, runId)` with 3 parameters. This spec extends it with an optional 4th parameter `gracePeriodMs`. Implementors must provide the 4-parameter signature. The authoritative complete interface is in §19 (Complete Type Reference) of this spec.\n\n**`killAll()` behavioral contract** (implements scope §22: \"On SIGTERM: SIGINT first, SIGKILL after grace period\"):\n\nThe grace period for each tracked process is stored at `register()` time, sourced from the run's resolved `RunOptions.gracePeriodMs` (see `03-run-handle-and-interaction.md` §6.2). This allows `killAll()` to use per-run grace periods without accepting parameters — important because `killAll()` is called from `process.on('exit')` and signal handlers where argument passing is impractical.\n\nWhen called from an **async-capable context** (e.g., `process.on('SIGTERM')`, `process.on('SIGINT')`):\n\n1. Send SIGINT (Unix) or `CTRL_C_EVENT` (Windows) to each tracked process group.\n2. Wait up to each process's registered grace period (default: 5000ms).\n3. Send SIGKILL (Unix) or `TerminateProcess` (Windows) to any process groups that have not exited.\n4. On Windows, additionally close each Job Object handle (defense-in-depth).\n\nWhen called from a **synchronous-only context** (`process.on('exit')`):\n\n1. Send SIGKILL (Unix) or close Job Object handles (Windows) immediately — the grace period cannot be honored because the event loop is shutting down.\n\n### 3.2 Unix Process Group Management\n\nOn Unix (macOS and Linux), each subprocess is spawned with `detached: true`, creating a new process group:\n\n- **Process group ID** equals the child PID (standard POSIX behavior for `setpgid(0, 0)`).\n- **Signal delivery** uses `process.kill(-pid, signal)` — the negated PID targets the entire process group, including any child-of-child processes (language servers, build tools, shell scripts).\n- **Zombie reaping** is handled by Node.js's internal libuv loop, which calls `waitpid()` for each child. The `'exit'` event on the `ChildProcess` triggers `ProcessTracker.unregister()`.\n\n### 3.3 Windows Job Object Management\n\nOn Windows, each subprocess is assigned to a **Job Object** immediately after spawn:\n\n- Created with `JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE` — when the job handle is closed (including on abrupt Node.js exit), the OS terminates all processes in the job.\n- This provides defense-in-depth: even if `process.on('exit')` handlers do not execute (e.g., `TerminateProcess` is called on the Node.js process itself), orphaned agent subprocesses are still cleaned up.\n- The Job Object handle is stored in the `ProcessTracker` alongside the PID and run ID.\n- `killAll()` on Windows closes all stored job handles, triggering OS-level cleanup.\n\n### 3.4 Node.js Exit Handlers\n\nThe `ProcessTracker` installs handlers on the following Node.js events (installed once, on first `register()` call):\n\n| Event | Action |\n|---|---|\n| `process.on('exit')` | Synchronous `killAll()`. Cannot start async work. |\n| `process.on('SIGTERM')` | `killAll()`, then `process.exit(1)`. |\n| `process.on('SIGINT')` | `killAll()`, then `process.exit(1)`. |\n| `process.on('uncaughtException')` | `killAll()`, then rethrow. |\n| `process.on('unhandledRejection')` | `killAll()`, then rethrow. |\n\n**Invariant:** `killAll()` must be unconditionally safe (never throws). If an individual process kill fails (e.g., process already exited, permission denied), the error is silently ignored and the tracker continues to the next process.\n\n### 3.5 Orphan Scenarios\n\n| Scenario | Unix | Windows |\n|---|---|---|\n| Normal Node.js exit | `process.on('exit')` → `killAll()` | Job Object auto-kill |\n| SIGTERM to Node.js | Handler runs `killAll()` | Node.js emulates SIGTERM on Windows; handler runs `killAll()` |\n| SIGKILL to Node.js | **Orphans survive.** Re-parented to PID 1. Cleanup: `kill -9 -<pgid>` | Job Object auto-kill (OS-level) |\n| Node.js crash (segfault) | Depends on signal handler; likely orphans | Job Object auto-kill |\n| `process.exit(0)` from code | `process.on('exit')` runs | `process.on('exit')` runs + Job Object |\n\n---\n\n## 4. Signal Handling\n\n### 4.1 Two-Phase Shutdown (abort)\n\nWhen `RunHandle.abort()` is called:\n\n```\nt=0ms Send graceful signal\n ├── Unix: SIGTERM to process group (kill(-pid, SIGTERM))\n └── Windows: GenerateConsoleCtrlEvent(CTRL_BREAK_EVENT)\n Start grace period timer\n\nt=0..G Monitor for process exit\n If process exits → cleanup, resolve RunResult\n\nt=G ms Grace period expired, process still alive\n ├── Unix: SIGKILL to process group (kill(-pid, SIGKILL))\n └── Windows: TerminateProcess(handle, 1)\n\nt=G+100ms Final check — process guaranteed dead\n Cleanup temp dir, resolve RunResult\n```\n\n**Default grace period:** 5000ms (scope §22).\n\n**Per-run override:** `RunOptions.gracePeriodMs` (spec-level extension defined in `03-run-handle-and-interaction.md` §6.2). Also configurable at the global config level via `gracePeriodMs`.\n\n**Signal choice rationale (abort vs. killAll):** `abort()` sends SIGTERM (a graceful termination request), because the consumer is explicitly ending a single run and the agent should have a chance to clean up. `killAll()` sends SIGINT (the interrupt signal), because it implements scope §22's requirement (\"On SIGTERM: SIGINT first, SIGKILL after grace period\") — when the Node.js process itself receives SIGTERM (or SIGINT, or encounters a fatal error), it forwards SIGINT to child processes as the first phase of shutdown. The choice of SIGINT (not SIGTERM) for the forwarded signal intentionally differentiates the signal received by children from the signal received by the parent, making it possible for agents that trap both signals to distinguish between \"the mux process is shutting down\" (SIGINT) and \"this specific run is being aborted\" (SIGTERM).\n\n### 4.2 Interrupt (SIGINT)\n\n`RunHandle.interrupt()` sends a soft interrupt, allowing the agent to finish its current tool call:\n\n| Platform | Pipe mode | PTY mode |\n|---|---|---|\n| Unix | `process.kill(-pid, 'SIGINT')` | Write `\\x03` (Ctrl+C) to PTY input |\n| Windows | `GenerateConsoleCtrlEvent(CTRL_C_EVENT, pid)` | Write `\\x03` to PTY input |\n\n**Windows caveat:** `GenerateConsoleCtrlEvent` requires the subprocess to share a console with the parent. For console-detached processes, the signal delivery may silently fail. All 10 built-in agents are spawned with `windowsHide: true` (console shared), so this is not an issue for built-in agents.\n\n### 4.3 Pause / Resume\n\n| Platform | Pause | Resume |\n|---|---|---|\n| Unix | `process.kill(pid, 'SIGTSTP')` | `process.kill(pid, 'SIGCONT')` |\n| Windows | `SuspendThread()` on all process threads | `ResumeThread()` on all process threads |\n\n**Windows caveat:** Thread enumeration for pause/resume uses `NtQuerySystemInformation` or `CreateToolhelp32Snapshot`. Race conditions exist if the process creates new threads between enumeration and suspension. This is a known limitation; in practice, agent CLI processes rarely create threads during operation.\n\n### 4.4 Signal Summary Table\n\n| Operation | Unix Signal | Windows Equivalent | PTY Override |\n|---|---|---|---|\n| Interrupt | SIGINT | CTRL_C_EVENT | `\\x03` to PTY stdin |\n| Graceful terminate | SIGTERM | CTRL_BREAK_EVENT | `\\x03` then close PTY |\n| Force kill | SIGKILL | TerminateProcess | Close PTY handle |\n| Pause | SIGTSTP | SuspendThread | `\\x1a` to PTY stdin |\n| Resume | SIGCONT | ResumeThread | (automatic on data write) |\n\n---\n\n## 5. Cross-Platform Support Matrix\n\n### 5.1 Per-Agent Platform Support\n\nFrom scope §23, extended with hermes-agent:\n\n| Agent | macOS | Linux | Windows | Notes |\n|---|---|---|---|---|\n| claude | ✅ | ✅ | ✅ | |\n| codex | ✅ | ✅ | ✅ | |\n| gemini | ✅ | ✅ | ✅ | |\n| copilot | ✅ | ✅ | ✅ | |\n| cursor | ✅ | ✅ | ✅ | |\n| opencode | ✅ | ✅ | ✅ | |\n| pi | ✅ | ✅ | ✅ | |\n| omp | ✅ | ✅ | partial | See §5.2 |\n| openclaw | ✅ | ✅ | ✅ | Requires PTY (§6); Windows needs ConPTY (Win 10 1809+), see §6.2 |\n| hermes | ✅ | ✅ | WSL2 only | See §5.3 |\n\n> **SCOPE EXTENSION:** hermes-agent platform support is WSL2-only on Windows, as the hermes CLI is a Python application that depends on Unix-specific system calls not available in native Windows.\n\n### 5.2 omp on Windows (Partial Support)\n\nThe omp agent has partial Windows support:\n\n- **Core run/prompt functionality:** Works.\n- **PTY-dependent features:** Not applicable (omp does not require PTY).\n- **Known limitations:** Some shell-dependent tool operations may behave differently under `cmd.exe` vs. bash.\n- **`supportedPlatforms`:** `['darwin', 'linux', 'win32']` — `'win32'` is included because the core agent does run on Windows.\n- **`AdapterRegistry.installed()` on Windows:** Returns `true` if the omp binary is found on PATH. The adapter does not block installation or detection on Windows.\n- **Runtime warning:** On Windows, the adapter emits a `debug` event with `level: 'warn'` during the spawn sequence: `'Agent \"omp\" has partial Windows support; some features may not work as expected.'` This warning does not prevent the run from proceeding.\n\n**Design rationale (omp vs. hermes):** omp includes `'win32'` in `supportedPlatforms` because the agent is functional on Windows for core operations — only some features are degraded. hermes excludes `'win32'` because the agent cannot run at all on native Windows (requires WSL2). The distinction is: partial support → include in platforms + warn; no support → exclude from platforms + throw `AGENT_NOT_INSTALLED`.\n\n### 5.3 hermes on Windows (WSL2 Only)\n\n> **SCOPE EXTENSION:** hermes-agent is a Python-based CLI (`pip install hermes-agent`) that requires Unix-specific system calls.\n\n- **Native Windows:** Not supported. `AdapterRegistry.installed()` returns `false` for hermes on native Windows (`process.platform === 'win32'` without WSL detection).\n- **WSL2:** Supported. The hermes adapter detects WSL2 by checking for `/proc/version` containing `microsoft` (case-insensitive) or the presence of `WSL_DISTRO_NAME` in the environment.\n- **`supportedPlatforms`:** `['darwin', 'linux']` — the adapter does not list `'win32'`. On WSL2, `process.platform` reports `'linux'`, so the adapter is available.\n- **Error on native Windows:** If a consumer attempts `mux.run({ agent: 'hermes' })` on native Windows, the `AdapterRegistry.detect()` method returns `installed: false`, and `mux.run()` throws `AgentMuxError` with code `AGENT_NOT_INSTALLED` and a message suggesting WSL2 installation.\n\n### 5.4 Platform Detection\n\nPlatform detection occurs at two levels:\n\n1. **Module-level:** `PlatformAdapter` selection (see §8).\n2. **Adapter-level:** Each adapter's `capabilities.supportedPlatforms` is checked by `AdapterRegistry.installed()` and `detect()`:\n\n```typescript\n// Simplified detection logic\nfunction isPlatformSupported(adapter: AgentAdapter): boolean {\n const platforms = adapter.capabilities.supportedPlatforms;\n return platforms.includes(process.platform as NodeJS.Platform);\n}\n```\n\nFor hermes on WSL2, the platform is `'linux'` (not `'win32'`), so the standard check succeeds.\n\n---\n\n## 6. PTY Support\n\n### 6.1 Agents Requiring PTY\n\n| Agent | `requiresPty` | Reason |\n|---|---|---|\n| claude | `false` | Streams JSON to stdout |\n| codex | `false` | Streams JSON to stdout |\n| gemini | `false` | Streams JSON to stdout |\n| copilot | `false` | Structured output |\n| cursor | `false` | Structured output |\n| opencode | `false` | Structured output |\n| pi | `false` | Structured output |\n| omp | `false` | Structured output |\n| openclaw | `true` | Interactive TUI; uses terminal control sequences. On Windows, requires ConPTY (Windows 10 1809+); older Windows versions fall back to winpty with potential output buffering differences (see §6.2). |\n| hermes | `false` | Structured output via `--output-format jsonl` flag |\n\n> **SCOPE EXTENSION:** hermes-agent does not require PTY; it supports a `--output-format jsonl` flag for structured output.\n\n> **Cross-spec reconciliation note:** `06-capabilities-and-models.md` §12.5 lists `requiresPty=true` for cursor and §12.9 lists `requiresPty=false` for openclaw. These values are **swapped** relative to the authoritative sources: scope §22 explicitly names OpenClaw as requiring PTY (\"PTY support via node-pty for agents that require it (OpenClaw, some interactive modes)\"), and `03-run-handle-and-interaction.md` §7.1 confirms openclaw=true, cursor=false. The values in this spec (spec 11) and spec 03 are correct; spec 06 §12.5 and §12.9 require correction during the cross-spec consistency review.\n\n### 6.2 PTY Backend Selection\n\nThe `node-pty` library selects its backend based on the platform:\n\n| Platform | Backend | Minimum OS Version | Notes |\n|---|---|---|---|\n| macOS | `openpty(3)` | macOS 10.15+ | Native POSIX PTY allocation |\n| Linux | `openpty(3)` | Kernel 2.6+ | Native POSIX PTY allocation |\n| Windows | ConPTY | Windows 10 1809+ | Preferred; better VT sequence support |\n| Windows (legacy) | winpty | Windows 7+ | Fallback; output buffering differences |\n\n**ConPTY vs. winpty behavioral differences:**\n\n| Aspect | ConPTY | winpty |\n|---|---|---|\n| VT sequence fidelity | High (native Windows Terminal support) | Moderate (translation layer) |\n| Output buffering | Line-buffered by default | May buffer more aggressively |\n| Resize support | Native | Emulated |\n| Performance | Better | Slower due to translation |\n\n### 6.3 VT Escape Sequence Stripping\n\nPTY output contains VT escape sequences (cursor movement, colors, etc.) that must be stripped before line-based event parsing. The stream engine applies a stripping pass before feeding lines to `adapter.parseEvent()`:\n\n```typescript\n/**\n * Strip ANSI/VT escape sequences from PTY output.\n *\n * Handles:\n * - CSI sequences: ESC [ ... final_byte\n * - OSC sequences: ESC ] ... ST\n * - Simple escapes: ESC followed by a single byte\n * - C1 control codes: 0x80-0x9F\n *\n * Maintains internal state to handle sequences split across\n * read() chunk boundaries.\n */\ninterface VtStripper {\n /**\n * Process a chunk of PTY output. Returns the text with all\n * escape sequences removed.\n *\n * @param chunk - Raw PTY output bytes (may contain partial sequences)\n * @returns Clean text suitable for line-based parsing\n */\n strip(chunk: string): string;\n\n /**\n * Reset internal state. Called when the PTY stream ends.\n */\n reset(): void;\n}\n```\n\n**Partial sequence handling:** When a VT escape sequence is split across two `read()` chunks, the `VtStripper` buffers the incomplete sequence and concatenates it with the start of the next chunk before deciding whether to strip or pass through. This is critical for correctness — a naïve regex-based stripper would produce spurious characters.\n\n### 6.4 node-pty as Optional Peer Dependency\n\n```json\n{\n \"peerDependencies\": {\n \"node-pty\": \">=1.0.0\"\n },\n \"peerDependenciesMeta\": {\n \"node-pty\": { \"optional\": true }\n }\n}\n```\n\nIf `node-pty` is not installed and the selected agent requires PTY:\n\n```typescript\nthrow new AgentMuxError(\n 'PTY_NOT_AVAILABLE',\n `Agent \"${agent}\" requires PTY support but node-pty is not installed. ` +\n `Install it with: npm install node-pty`\n);\n```\n\n**Native module caveat:** `node-pty` requires platform-specific compilation via `node-gyp`. If the Node.js version changes after installation (e.g., `nvm use` to a different version), the native bindings may become invalid. The error manifests as a module load failure, which the stream engine catches and re-throws as `PTY_NOT_AVAILABLE` with an amended message suggesting reinstallation.\n\n### 6.5 PTY Resource Limits\n\nOn Unix systems, each PTY-mode spawn allocates a real OS PTY pair via `openpty(3)`. Systems have finite PTY limits:\n\n- **Linux:** Controlled by `/proc/sys/kernel/pty/max` (default: 4096).\n- **macOS:** Since macOS 10.7, PTYs are allocated via a `devfs`-backed mechanism. The limit is configurable via `sysctl kern.tty.ptmx_max` (typically 512+ on modern macOS).\n\nExceeding the PTY limit results in `ENXIO` or `EIO` from `openpty()`. The stream engine catches this and throws `AgentMuxError` with code `SPAWN_ERROR` and a message indicating PTY exhaustion.\n\n---\n\n## 7. Cross-Platform Path Normalization\n\n### 7.1 agent-mux Own Paths\n\n| Path Purpose | Resolution | Override |\n|---|---|---|\n| Global config dir | `os.homedir()/.agent-mux/` | `createClient({ configDir })` or `AGENT_MUX_CONFIG_DIR` env var |\n| Project config dir | `<projectRoot>/.agent-mux/` | `createClient({ projectConfigDir })` or `--project-dir` CLI flag |\n| Run temp dir | `os.tmpdir()/agent-mux-<runId>/` | Not overridable |\n| Run index | `<projectConfigDir>/run-index.jsonl` | Project-local (scope §4); falls back to global config dir if no project root is resolved |\n\n**`os.homedir()` resolution per platform:**\n\n| Platform | Typical value |\n|---|---|\n| macOS | `/Users/<username>` |\n| Linux | `/home/<username>` |\n| Windows | `C:\\Users\\<username>` (via `%USERPROFILE%`) |\n\n**`os.tmpdir()` resolution per platform:**\n\n| Platform | Typical value |\n|---|---|\n| macOS | `/var/folders/<hash>/T` (via `$TMPDIR`) |\n| Linux | `/tmp` |\n| Windows | `C:\\Users\\<username>\\AppData\\Local\\Temp` (via `GetTempPath()`) |\n\n### 7.2 Per-Agent Config Paths\n\nEach adapter resolves its agent's native config paths according to the agent's own conventions. The authoritative table of per-agent config file paths is in `08-config-and-auth.md` §7 (Native Config File Locations). This section summarizes the platform resolution rules relevant to process lifecycle.\n\n**Authoritative config paths** (from `08-config-and-auth.md` §7):\n\n| Agent | Global Config Path | Format |\n|---|---|---|\n| claude | `~/.claude/settings.json` | JSON |\n| codex | `~/.codex/config.json` | JSON |\n| gemini | `~/.config/gemini/settings.json` | JSON |\n| copilot | `~/.config/github-copilot/settings.json` | JSON |\n| cursor | `~/.cursor/settings.json` | JSON |\n| opencode | `~/.config/opencode/opencode.json` | JSON |\n| pi | `~/.pi/agent/settings.json` | JSON |\n| omp | `~/.omp/agent/settings.json` | JSON |\n| openclaw | `~/.openclaw/config.json` | JSON |\n| hermes | `~/.hermes/cli-config.yaml` | YAML |\n\n> **SCOPE EXTENSION:** hermes config path is `~/.hermes/cli-config.yaml` (YAML format, not JSON). See `08-config-and-auth.md` §7.2 for YAML handling details.\n\n**Platform resolution:** The `~` prefix in all paths resolves to `os.homedir()` (see §7.1). On Windows, `os.homedir()` resolves to `%USERPROFILE%` (typically `C:\\Users\\<username>`). Paths using `~/.config/` follow the XDG convention on Linux but use the same `~/.config/` path on macOS (not `~/Library/`). The per-agent config paths are **the same on all platforms** — they use home-relative paths, not platform-specific config directories. This is because the agent CLIs themselves use the same home-relative paths across platforms.\n\n**Note:** This table intentionally omits the \"Project Config Path\" column present in `08-config-and-auth.md` §7, as project-level config paths are not relevant to process lifecycle. See `08-config-and-auth.md` §7 for the complete table including project config paths, merge semantics, and format-specific notes.\n\n### 7.3 Path Separator Normalization\n\nAll paths exposed through agent-mux API surfaces are normalized to **forward slashes** regardless of platform:\n\n```typescript\n// Internal normalization utility\nfunction normalizePath(p: string): string {\n return p.replace(/\\\\/g, '/');\n}\n```\n\nThis normalization applies to:\n\n- `AgentEvent` fields containing file paths (`file_read.path`, `file_write.path`, etc.)\n- `RunResult` fields containing paths\n- `SessionManager` path fields\n- `ConfigManager` path fields\n- All API return values\n\n**Not normalized:** Arguments passed to `child_process.spawn()` and `pty.spawn()` — these use the OS-native format as expected by the agent CLI binary.\n\n### 7.4 Run ID Format\n\nRun IDs are **ULIDs** (Universally Unique Lexicographically Sortable Identifiers):\n\n- **Format:** 26-character string, e.g., `01ARYZ6S41TSV4RRFFQ69G5FAV`\n- **Character set:** Crockford Base32 (`0123456789ABCDEFGHJKMNPQRSTVWXYZ`)\n- **Properties:** Monotonically sortable, URL-safe, filesystem-safe on all platforms (no colons, slashes, or special characters)\n- **Generation:** Client-side via the `ulid` package. If `RunOptions.runId` is provided, it must match the ULID format (`/^[0-9ABCDEFGHJKMNPQRSTVWXYZ]{26}$/`); otherwise, `AgentMuxError` with code `VALIDATION_ERROR` is thrown.\n\n---\n\n## 8. Platform Abstraction Layer\n\n### 8.1 PlatformAdapter Interface\n\nPlatform-specific behavior is encapsulated behind the `PlatformAdapter` interface, selected at module load time. The base interface is defined in `03-run-handle-and-interaction.md` §8.3; this spec adds two utility methods for path and line-ending normalization:\n\n```typescript\n/**\n * Base methods (defined in 03-run-handle-and-interaction.md §8.3):\n * - sendInterrupt(pid): void\n * - sendTerminate(pid): void\n * - sendKill(pid): void\n * - suspendProcess(pid): void\n * - resumeProcess(pid): void\n * - createProcessGroup(pid): ProcessGroupHandle\n * - killProcessGroup(handle): void\n * - tempDir(runId): string\n * - shellCommand(): [cmd, args]\n *\n * Extended by this spec:\n */\ninterface PlatformAdapter {\n // ... all base methods from 03-run-handle-and-interaction.md §8.3 ...\n\n /**\n * Normalize a path for API surface output.\n * Converts backslashes to forward slashes on Windows; no-op on Unix.\n *\n * > **Spec-level addition:** Not in base PlatformAdapter from spec 03.\n * > Required by the path normalization contract (§7.3).\n */\n normalizePath(p: string): string;\n\n /**\n * Strip \\r from line endings (Windows CRLF → LF).\n * Returns the line unchanged on Unix.\n *\n * > **Spec-level addition:** Not in base PlatformAdapter from spec 03.\n * > Required for CRLF handling (§11.2).\n */\n normalizeLineEnding(line: string): string;\n}\n```\n\n> **Note on interface divergence:** The base `PlatformAdapter` interface in `03-run-handle-and-interaction.md` §8.3 defines 9 methods. This spec extends it with 2 additional methods (`normalizePath`, `normalizeLineEnding`). Implementors must provide all 11 methods. The authoritative complete interface is in §19 (Complete Type Reference) of this spec.\n\n### 8.2 Implementation Selection\n\n```typescript\nconst platform: PlatformAdapter =\n process.platform === 'win32'\n ? new WindowsPlatformAdapter()\n : new UnixPlatformAdapter();\n```\n\nThe selection is made once at module load time. It is not reconfigurable at runtime.\n\n### 8.3 ProcessGroupHandle\n\n```typescript\n/**\n * Opaque handle representing a process group.\n * - Unix: the process group ID (number, same as child PID).\n * - Windows: a Job Object handle (native handle wrapped in a class).\n */\ntype ProcessGroupHandle = UnixProcessGroup | WindowsJobObject;\n\ninterface UnixProcessGroup {\n readonly kind: 'unix';\n readonly pgid: number;\n}\n\ninterface WindowsJobObject {\n readonly kind: 'windows';\n readonly jobHandle: unknown; // Native handle, opaque to TypeScript\n close(): void;\n}\n```\n\n---\n\n## 9. Shell Invocation\n\n### 9.1 When Shell Mode Is Used\n\nShell mode (`SpawnArgs.shell: true`) is used when the adapter needs the system shell to resolve the command. Most built-in adapters do **not** use shell mode — they invoke the agent CLI binary directly.\n\n| Agent | Shell mode | Reason |\n|---|---|---|\n| claude | No | Direct binary: `claude` |\n| codex | No | Direct binary: `codex` |\n| gemini | No | Direct binary: `gemini` |\n| copilot | No | `cliCommand: 'copilot'`; actual spawn: `gh copilot ...` (`SpawnArgs.command = 'gh'`, `args = ['copilot', ...]`) |\n| cursor | No | Direct binary: `cursor` |\n| opencode | No | Direct binary: `opencode` |\n| pi | No | Direct binary: `pi` |\n| omp | No | Direct binary: `omp` |\n| openclaw | No | Direct binary: `openclaw` |\n| hermes | No | Direct binary: `hermes` |\n\nShell mode may be used by **plugin adapters** that register custom agents with non-standard invocation patterns.\n\n### 9.2 Shell Selection Per Platform\n\nWhen shell mode is required:\n\n| Platform | Shell command | Invocation |\n|---|---|---|\n| macOS | `/bin/sh` | `/bin/sh -c '<command>'` |\n| Linux | `/bin/sh` | `/bin/sh -c '<command>'` |\n| Windows | `cmd.exe` | `cmd.exe /c <command>` |\n\n**Design rationale:** The minimal POSIX shell (`/bin/sh`) is used on Unix to avoid profile-script side effects. On Debian/Ubuntu, `/bin/sh` is `dash` (not `bash`); adapters that construct shell commands must use POSIX sh syntax. If bash-specific features are needed, the adapter should explicitly use `/bin/bash -c`.\n\nOn Windows, `cmd.exe` is the default. If an adapter requires PowerShell, it should set `SpawnArgs.command` to `powershell.exe` with appropriate `-Command` arguments rather than using shell mode.\n\n### 9.3 Shell Injection Prevention\n\n**Critical security requirement:** Adapters must **never** interpolate user-supplied `RunOptions` fields (prompt text, file paths, environment variables) into shell command strings. All adapters should:\n\n1. Build the command and arguments as separate string array elements.\n2. Use shell mode only when strictly required (e.g., for PATH resolution).\n3. If shell mode is used, use `child_process.spawn` with `shell: true` and pass the command as the first argument with args as separate array elements — Node.js handles escaping.\n\n---\n\n## 10. Run Isolation\n\n### 10.1 Temp Directory Lifecycle\n\n```\nmux.run() called\n │\n ├── Step 2: mkdir(os.tmpdir()/agent-mux-<runId>/, { mode: 0o700 })\n │ Creates: stdin-buffer.txt, harness-state.json\n │\n ├── During run: adapter may write to temp dir\n │ Optional: pty-log.txt (PTY + debug mode only)\n │\n └── Run terminates (any terminal state)\n │\n └── Cleanup: rm -rf temp dir (best-effort)\n ├── Success: directory removed\n └── Failure (Windows locked files): directory left for OS cleanup\n```\n\n### 10.2 Temp Directory Contents\n\n| File | Purpose | Created |\n|---|---|---|\n| `stdin-buffer.txt` | Buffered stdin for batch prompt injection | Always |\n| `harness-state.json` | Interaction queue, internal state | Always |\n| `pty-log.txt` | Raw PTY output for debugging | PTY mode + `debug: true` only |\n\n### 10.3 Temp Directory Security\n\nThe temp directory is created with mode `0o700` (owner-only access) to prevent:\n\n- Other users on shared systems from reading `harness-state.json` (may contain prompt text).\n- Injection of data into `stdin-buffer.txt` by other processes.\n- Symlink attacks: `mkdtemp()` is used on Unix to create the directory atomically.\n\nOn Windows, `os.tmpdir()` resolves to the user's `%TEMP%` directory, which is typically accessible only to the user and administrators. The `0o700` mode is applied but has limited effect on Windows (NTFS ACLs take precedence).\n\n### 10.4 Cleanup Failures\n\nCleanup is best-effort. Known failure scenarios:\n\n| Scenario | Platform | Behavior |\n|---|---|---|\n| File locked by agent subprocess | Windows | `rmdir` fails; directory left in `%TEMP%` |\n| Permission denied | Unix | `rm -rf` fails; logged as `debug` warning |\n| Disk full (can't delete) | Any | Extremely rare; cleanup skipped |\n| Node.js killed before cleanup | Any | ProcessTracker kills subprocess; temp dir left |\n\nAccumulated orphaned temp directories are the consumer's responsibility to clean up. A utility function is available:\n\n```typescript\n/**\n * Remove all orphaned agent-mux temp directories.\n * \n * Scans os.tmpdir() for directories matching 'agent-mux-*' that have\n * no corresponding running process. Safe to call while runs are active\n * (skips directories for active run IDs).\n *\n * @returns Number of directories removed.\n */\nfunction cleanupOrphanedTempDirs(): Promise<number>;\n```\n\n---\n\n## 11. Line Parsing and CRLF Handling\n\n### 11.1 Line Parser\n\nThe stream engine's line parser converts raw subprocess output into individual lines for `adapter.parseEvent()`:\n\n```typescript\ninterface LineParser {\n /**\n * Feed a chunk of raw output. Calls the handler for each\n * complete line found.\n *\n * @param chunk - Raw stdout/PTY output\n * @param handler - Called with each complete line (no trailing newline)\n */\n feed(chunk: string, handler: (line: string) => void): void;\n\n /**\n * Flush any remaining partial line. Called when the subprocess exits.\n */\n flush(handler: (line: string) => void): void;\n}\n```\n\n### 11.2 CRLF Normalization\n\nOn Windows, subprocess stdout may use CRLF (`\\r\\n`) line endings. The line parser **always** strips trailing `\\r` before passing lines to the handler:\n\n```typescript\n// Inside LineParser.feed():\nconst line = rawLine.endsWith('\\r') ? rawLine.slice(0, -1) : rawLine;\nhandler(line);\n```\n\nThis is critical because trailing `\\r` characters would corrupt JSON parsing in `adapter.parseEvent()` (e.g., `{\"type\": \"text_delta\"}\\r` is not valid JSON).\n\n### 11.3 PTY Output Pipeline\n\nFor PTY-mode runs, the pipeline has an additional stripping step:\n\n```\nPTY output → VtStripper.strip() → LineParser.feed() → adapter.parseEvent()\n```\n\nFor pipe-mode runs:\n\n```\nstdout → LineParser.feed() → adapter.parseEvent()\nstderr → (captured for error reporting, not parsed for events)\n```\n\n---\n\n## 12. Concurrency Model\n\n### 12.1 Independent State Per RunHandle\n\nEach `RunHandle` instance owns:\n\n| Resource | Isolation |\n|---|---|\n| Subprocess (PID) | Own PID, own stdio pipes or PTY |\n| Process group | Own process group (Unix) or Job Object (Windows) |\n| Event buffer | Per-instance, not shared between handles |\n| State machine | Per-instance `RunState` |\n| Interaction channel | Per-instance queue |\n| Timers | Per-instance timeout and inactivity timers |\n| Temp directory | Unique path per `runId` |\n\n### 12.2 Shared Resources\n\n| Resource | Sharing model | Synchronization mechanism |\n|---|---|---|\n| `ProcessTracker` | Singleton | Synchronous register/unregister (no async gaps) |\n| Agent config files | Read: point-in-time snapshots | No locking for reads |\n| Agent config files | Write: via `ConfigManager` | File-level advisory locking |\n| `run-index.jsonl` | Append-only per RunHandle | File-level advisory locking |\n| Session files | Read-only by agent-mux | No locking (agent-owned) |\n| `node-pty` instances | One per PTY-mode run | No sharing needed |\n| `PlatformAdapter` | Singleton | Stateless (no synchronization needed) |\n| `AdapterRegistry` | Singleton | `installed()` cache with 30s TTL |\n\n### 12.3 File Locking Protocol\n\nFile-level advisory locking is used for all shared mutable files:\n\n```typescript\n/**\n * Acquire an advisory lock on the given file path.\n * \n * Uses platform-appropriate locking:\n * - Unix: flock(2) or fcntl(2) (depending on NFS requirements)\n * - Windows: LockFileEx with LOCKFILE_EXCLUSIVE_LOCK\n *\n * @param filePath - Path to the file to lock\n * @param timeoutMs - Maximum time to wait for the lock (default: 5000ms)\n * @throws AgentMuxError with code CONFIG_LOCK_ERROR if lock cannot be acquired\n */\nasync function acquireFileLock(filePath: string, timeoutMs?: number): Promise<FileLock>;\n\ninterface FileLock {\n release(): Promise<void>;\n}\n```\n\n**Advisory lock limitation:** Advisory locks are cooperative — they only prevent conflicts between processes that use the same locking protocol. External processes (e.g., a user editing config with a text editor) can bypass the lock. This is documented as a known limitation.\n\n---\n\n## 13. Environment Variable Handling\n\n### 13.1 SpawnArgs.env Merge\n\nThe subprocess environment is constructed by merging:\n\n```typescript\nconst childEnv = {\n ...process.env, // Parent process environment\n ...spawnArgs.env, // Adapter-provided overrides (takes precedence)\n};\n```\n\n### 13.2 Sensitive Variable Inheritance\n\nThe parent process's environment variables are inherited by the subprocess. This includes potentially sensitive variables (API keys, tokens, credentials). Each adapter is responsible for:\n\n1. **Setting required variables:** Adding agent-specific API key variables to `spawnArgs.env` based on `AuthManager` state.\n2. **Not filtering parent env:** agent-mux does not strip or filter inherited variables, as agents may legitimately need access to `PATH`, `HOME`, `LANG`, `TERM`, and other system variables.\n\n### 13.3 Per-Agent Environment Variables\n\nKey agent-specific environment variables set by adapters:\n\n| Agent | Variable(s) | Purpose |\n|---|---|---|\n| claude | `ANTHROPIC_API_KEY` | API authentication |\n| codex | `OPENAI_API_KEY` | API authentication |\n| gemini | `GOOGLE_API_KEY` | API authentication |\n| copilot | `GITHUB_TOKEN` | API authentication |\n| cursor | `CURSOR_API_KEY` (fallback) | Primary auth via session token in `~/.cursor/`; API key as fallback |\n| opencode | `OPENAI_API_KEY`, `ANTHROPIC_API_KEY` | Multi-provider API authentication |\n| pi | `OPENAI_API_KEY`, `ANTHROPIC_API_KEY` | Multi-provider API authentication |\n| omp | `OPENAI_API_KEY`, `ANTHROPIC_API_KEY` | Multi-provider API authentication |\n| openclaw | `OPENCLAW_API_KEY` | API authentication |\n| hermes | `OPENROUTER_API_KEY`, `NOUS_API_KEY`, `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `GITHUB_TOKEN`, `GOOGLE_API_KEY` | Multi-provider API authentication |\n\n> **SCOPE EXTENSION:** hermes-agent supports the broadest set of auth environment variables among all supported agents, reflecting its multi-provider architecture.\n\n**Note:** Adapters set only the variables their agent requires. Agents marked \"(none set by adapter)\" rely on authentication mechanisms other than environment variables (e.g., OAuth tokens, config file credentials). The full auth strategy per agent is documented in `08-config-and-auth.md` §10 (AuthMethod) and §14 (Auth Detection Strategies).\n\n**Cross-reference:** Full per-agent auth environment variable details are in `08-config-and-auth.md` §8 (Table 8.2).\n\n---\n\n## 14. Backpressure and Buffer Management\n\nThis section provides the authoritative reference for event buffer backpressure, expanding on `03-run-handle-and-interaction.md` §10.\n\n### 14.1 Buffer Architecture\n\n```\nSubprocess stdout/PTY → Line Parser → adapter.parseEvent() → ┐\n │\n EventEmitter.emit()│ (synchronous, always)\n │\n Event Buffer (ring)\n │ │\n v v\n Iterator 1 Iterator 2\n (read cursor) (read cursor)\n```\n\n**Key ordering guarantee:** EventEmitter handlers fire **before** the event enters the buffer. This means:\n\n1. `on()` handlers always see every event (no drops).\n2. `on()` handlers see events before `for await` iterators.\n3. If an `on()` handler blocks synchronously, it delays all downstream processing.\n\n### 14.2 High-Water Mark Configuration\n\n| Configuration level | Property | Default |\n|---|---|---|\n| Client-level | `createClient({ eventBufferSize })` | 1000 |\n| Run-level | `RunOptions.eventBufferSize` | Inherits from client |\n\n> **Spec-level addition:** `RunOptions.eventBufferSize` and `AgentMuxClientOptions.eventBufferSize` are not present in scope §6 but are required to support configurable backpressure. They are typed as `number` (positive integer, minimum 100, maximum 100000).\n\n### 14.3 Fan-Out Model\n\nMultiple async iterators on the same `RunHandle` each get their own read cursor:\n\n- Events are retained in the buffer until **all** active iterators have consumed them.\n- If one iterator stalls, events accumulate for all iterators.\n- When the buffer exceeds the high-water mark, the eviction strategy is:\n 1. Evict events already consumed by all iterators.\n 2. If still over the high-water mark, drop the oldest unconsumed events.\n 3. Emit a `debug` event with `level: 'warn'` and message `'Event buffer overflow: N events dropped'` (as specified in `03-run-handle-and-interaction.md` §10.3). This event is not subject to backpressure and is always delivered.\n\n> **Note:** The `RunHandle` iterator JSDoc in `03-run-handle-and-interaction.md` §2 informally refers to this as a \"`buffer_overflow` warning\". The authoritative event type is `debug` with `level: 'warn'`, as defined in §10.3 of that same spec.\n\n### 14.4 Post-Completion Iteration\n\nIterating over a `RunHandle` after the run has completed yields all buffered events (those still within the high-water mark), then immediately completes. Events dropped due to overflow during the run are permanently lost.\n\n---\n\n## 15. Security Considerations\n\n### 15.1 Process Isolation\n\n- Each run's subprocess executes in its own process group (Unix) or Job Object (Windows).\n- Subprocesses cannot access each other's stdio pipes, temp directories, or internal state.\n- The `ProcessTracker` ensures all subprocesses are terminated on Node.js exit.\n\n### 15.2 Temp Directory Security\n\n- Created with mode `0o700` to prevent unauthorized access.\n- On Unix, `mkdtemp()` is used for atomic creation (prevents TOCTOU race conditions).\n- Contents (`harness-state.json`, `stdin-buffer.txt`) may contain sensitive prompt text and should not be world-readable.\n\n### 15.3 Shell Injection Prevention\n\n- Built-in adapters never use shell mode; they invoke agent CLIs directly.\n- Plugin adapters that require shell mode must use `child_process.spawn` with args as separate array elements; Node.js handles escaping.\n- Direct string interpolation into shell commands is explicitly prohibited.\n\n### 15.4 Environment Variable Leakage\n\n- Parent process environment is inherited by subprocesses. Sensitive variables (API keys, tokens) flow to agent subprocesses.\n- agent-mux does not filter the parent environment because agents may legitimately need system variables.\n- Consumers with strict security requirements should use a minimal parent environment.\n\n### 15.5 File Locking Limitations\n\n- Advisory locking is cooperative; external processes can bypass it.\n- Config file corruption is possible if external tools write to agent config files while agent-mux holds a lock.\n\n### 15.6 Run ID Validation\n\n- `RunOptions.runId`, if provided, must match the ULID format (`/^[0-9ABCDEFGHJKMNPQRSTVWXYZ]{26}$/`).\n- This prevents path traversal attacks where a crafted run ID like `../../etc/passwd` could be used in temp directory paths.\n\n### 15.7 PTY Output Sanitization\n\n- PTY output may contain VT escape sequences that could be exploited for terminal injection if displayed raw.\n- The `VtStripper` removes all escape sequences before event parsing.\n- The `pty-log.txt` debug file contains raw (unsanitized) PTY output and should be treated as untrusted data.\n\n---\n\n## 16. Node.js Version Requirements\n\n| Requirement | Minimum Version | Rationale |\n|---|---|---|\n| Node.js | 20.9.0 | Stable Web Streams API, `structuredClone()`, improved `AbortSignal` support |\n| npm | 10.0.0 | Workspace protocol support for monorepo package structure |\n| TypeScript (development) | 5.3 | `satisfies` operator, const type parameters |\n\nThe `engines` field in `package.json`:\n\n```json\n{\n \"engines\": {\n \"node\": \">=20.9.0\",\n \"npm\": \">=10.0.0\"\n }\n}\n```\n\n---\n\n## 17. Error Reference\n\nProcess lifecycle errors and their codes:\n\n| Error condition | ErrorCode | Thrown by | Defined in |\n|---|---|---|---|\n| Agent CLI not found | `AGENT_NOT_FOUND` | `mux.run()` | `01-core-types-and-client.md` §3.1; scope §14 (AdapterRegistry) |\n| Agent not installed on platform | `AGENT_NOT_INSTALLED` | `AdapterRegistry.detect()` | `01-core-types-and-client.md` §3.1; scope §14 (AdapterRegistry) |\n| PTY required but node-pty missing | `PTY_NOT_AVAILABLE` | Stream engine, Step 4 | Spec-level addition (this spec + `03-run-handle-and-interaction.md` §7.3) |\n| Subprocess spawn failure | `SPAWN_ERROR` | Stream engine, Step 4 | `01-core-types-and-client.md` §3.1; scope §22 (process lifecycle) |\n| Run timeout exceeded | `TIMEOUT` | Stream engine, timer | `01-core-types-and-client.md` §3.1; scope §22 (process lifecycle) |\n| Config file lock acquisition failure | `CONFIG_LOCK_ERROR` | `ConfigManager` writes | `01-core-types-and-client.md` §3.1; scope §17 (ConfigManager) |\n| Invalid run ID format | `VALIDATION_ERROR` | `mux.run()`, Step 1 | `01-core-types-and-client.md` §3.1; scope §6 (RunOptions) |\n| Unsupported capability for agent | `CAPABILITY_ERROR` | `mux.run()`, Step 1 | `01-core-types-and-client.md` §3.1; scope §11 (capabilities) |\n\n> **Note:** `AGENT_NOT_INSTALLED` is defined in the canonical `ErrorCode` union in `01-core-types-and-client.md` §3.1. `PTY_NOT_AVAILABLE` is a **spec-level addition** not present in scope's `ErrorCode` list; it is referenced in `03-run-handle-and-interaction.md` §7.3 and defined here.\n\n---\n\n## 18. Behavioral Contracts\n\n### 18.1 Graceful Shutdown Guarantee\n\nWhen `mux.run()` returns a `RunHandle`, the following guarantee holds:\n\n> **If the Node.js process exits normally (via `process.exit()`, end of event loop, or SIGTERM/SIGINT), all active subprocesses will be terminated before the Node.js process exits.**\n\nThis guarantee does **not** hold for `SIGKILL` on Unix (uncatchable). On Windows, the Job Object provides this guarantee even for abrupt exits.\n\n### 18.2 Event Ordering Guarantee\n\nEvents from a single subprocess are delivered in the order they were parsed from stdout/PTY output. No reordering occurs in the line parser, event buffer, or fan-out system.\n\n### 18.3 Cleanup Ordering\n\nRun cleanup follows this sequence:\n\n1. Subprocess is confirmed terminated (exit event received or force-killed).\n2. `ProcessTracker.unregister(pid)` removes the process from tracking.\n3. Final events (`session_end`, terminal state event) are emitted.\n4. `RunResult` promise is resolved.\n5. Async iterators complete (`{ done: true }`).\n6. Temp directory is removed (best-effort).\n7. `run-index.jsonl` entry is appended (under file lock).\n\nSteps 3–5 are synchronous (within the same microtask). Step 6 is async and may fail. Step 7 is async with retry on lock contention.\n\n---\n\n## 19. Complete Type Reference\n\n```typescript\n// ── ProcessTracker ──────────────────────────────────────────────────────\n\ninterface ProcessTracker {\n register(pid: number, groupId: number, runId: string, gracePeriodMs?: number): void;\n unregister(pid: number): void;\n killAll(): void;\n readonly activeCount: number;\n}\n\n// ── PlatformAdapter (complete, extends base from spec 03 §8.3) ──────────\n\ninterface PlatformAdapter {\n // Base methods (from 03-run-handle-and-interaction.md §8.3):\n sendInterrupt(pid: number): void;\n sendTerminate(pid: number): void;\n sendKill(pid: number): void;\n suspendProcess(pid: number): void;\n resumeProcess(pid: number): void;\n createProcessGroup(pid: number): ProcessGroupHandle;\n killProcessGroup(handle: ProcessGroupHandle): void;\n tempDir(runId: string): string;\n shellCommand(): [cmd: string, args: string[]];\n // Extended by this spec (§8.1):\n normalizePath(p: string): string;\n normalizeLineEnding(line: string): string;\n}\n\n// ── ProcessGroupHandle ──────────────────────────────────────────────────\n\ntype ProcessGroupHandle = UnixProcessGroup | WindowsJobObject;\n\ninterface UnixProcessGroup {\n readonly kind: 'unix';\n readonly pgid: number;\n}\n\ninterface WindowsJobObject {\n readonly kind: 'windows';\n readonly jobHandle: unknown;\n close(): void;\n}\n\n// ── VtStripper ──────────────────────────────────────────────────────────\n\ninterface VtStripper {\n strip(chunk: string): string;\n reset(): void;\n}\n\n// ── LineParser ──────────────────────────────────────────────────────────\n\ninterface LineParser {\n feed(chunk: string, handler: (line: string) => void): void;\n flush(handler: (line: string) => void): void;\n}\n\n// ── FileLock ────────────────────────────────────────────────────────────\n\ninterface FileLock {\n release(): Promise<void>;\n}\n\n// ── Utility ─────────────────────────────────────────────────────────────\n\nfunction cleanupOrphanedTempDirs(): Promise<number>;\nfunction acquireFileLock(filePath: string, timeoutMs?: number): Promise<FileLock>;\n```\n\n---\n\n## 20. Spec-Level Additions\n\nThe following items are **spec-level additions** — details that are implied by but not explicitly stated in the scope document:\n\n| Addition | Section | Rationale |\n|---|---|---|\n| `PlatformAdapter.normalizePath()` | §8.1 | Extends base interface from spec 03; required for path normalization (§7.3) |\n| `PlatformAdapter.normalizeLineEnding()` | §8.1 | Extends base interface from spec 03; required for CRLF handling (§11.2) |\n| `VtStripper` interface | §6.3 | Required for PTY output parsing correctness |\n| `LineParser` interface | §11.1 | Required for the subprocess-to-event pipeline |\n| `cleanupOrphanedTempDirs()` utility | §10.4 | Addresses temp directory accumulation on Windows |\n| `RunOptions.eventBufferSize` | §14.2 | Per-run backpressure configuration (also referenced in spec 03 §10.1) |\n| `AgentMuxClientOptions.eventBufferSize` | §14.2 | Per-client backpressure configuration |\n| `PTY_NOT_AVAILABLE` error code | §17 | Referenced in spec 03 §7.3 but not in scope's ErrorCode list |\n| hermes-agent WSL2 detection | §5.3 | Platform-specific detection for hermes on Windows |\n| omp partial Windows support warning | §5.2 | Behavioral contract for partial platform support |\n| Run ID ULID validation | §7.4 | Path traversal prevention |\n| Temp directory mode 0o700 | §10.3 | Security hardening for shared systems |\n| `ProcessTracker.register()` `gracePeriodMs` param | §3.1 | Per-run grace period stored at registration for `killAll()` |\n\n**Note:** `AGENT_NOT_INSTALLED` is **not** a spec-level addition — it is part of the canonical `ErrorCode` union defined in `01-core-types-and-client.md` §3.1. `RunOptions.gracePeriodMs` is also not a spec-level addition from this spec — it is defined in `03-run-handle-and-interaction.md` §6.2.\n\n---\n\n## Implementation Status (2026-04-12)\n\n### Actual spawn model\n\nSpawning is implemented in `packages/core/src/spawn-runner.ts` via a single `node:child_process.spawn`. The pipeline per attempt:\n\n1. `adapter.buildSpawnArgs(options)` → abstract `SpawnArgs { command, args, env, cwd, stdin?, shell? }`.\n2. `buildInvocationCommand(options.invocation, spawnArgs, agent)` → concrete host `{ command, args, env, cwd, stdin?, shell }` (see `spawn-invocation.ts` and `docs/13-invocation-modes.md`).\n3. `child_process.spawn(cmd, args, { cwd, env, stdio: ['pipe','pipe','pipe'], detached, shell })`, where `detached` is `true` on Unix-like platforms so the child becomes a process-group leader.\n4. Line-buffer stdout/stderr, feed to `adapter.parseEvent(line, ctx)`, emit `AgentEvent`s.\n5. Honour `retryPolicy`, `timeout` (overall), `inactivityTimeout`. Retries re-enter step 1.\n\n### Kill strategy\n\n- **Unix**: `process.kill(-pid, sig)` sends the signal to the entire process group (SIGTERM, then SIGKILL after `gracePeriodMs`).\n- **Windows**: Node terminates the root child; for stubborn trees the runner falls back to `taskkill /PID <pid> /T /F`. A full Win32 Job Object implementation (per §3.3) is not yet wired — the current approach is pragmatic but can leak grandchildren in rare cases.\n\n### ProcessTracker\n\n`packages/core/src/process-tracker.ts` provides a registry for in-flight runs and a `killAll()` used by process exit handlers. Registered at spawn time, unregistered on clean exit.\n\n### Invocation modes vs process tracking\n\nWhen `invocation.mode` is `docker`, `ssh`, or `k8s`, the `pid` tracked by `ProcessTracker` belongs to the *transport* process (`docker`, `ssh`, `kubectl`), not the harness. Signal propagation to the containerised/remote harness is the transport's responsibility — Docker forwards SIGTERM to PID 1 in the container; `kubectl exec` / `kubectl run` forwards to the pod's process group.\n\n#### SSH signal propagation\n\nThe `ssh` invocation builder in `packages/core/src/spawn-invocation.ts` now:\n\n- Passes `-t` to allocate a pseudo-tty, so TERM/INT received by the local ssh client are delivered to the remote side.\n- Wraps the remote command in a POSIX-sh PID-forwarding trap:\n\n ```sh\n exec /bin/sh -c '<cd && env && cmd> & pid=$!; trap \"kill -TERM $pid\" TERM INT; wait $pid'\n ```\n\n The wrapper `exec`s away so the sh is not an extra hop, backgrounds the real command, installs a signal trap forwarding TERM/INT to the child's PID, then `wait`s. When the local spawn-runner sends SIGTERM (then SIGKILL after the grace window) to the ssh client, the signal is propagated to the remote harness process for a clean shutdown.\n\nThe wrapper appears exactly once per invocation and is covered by unit tests in `packages/core/tests/build-invocation-command.test.ts`.\n",
"documents": []
},
"outgoingEdges": [],
"incomingEdges": []
}