Agentic AI Atlas

II.

Page JSON

page:docs-harness-features-backlog-roadmap

Structured · live

Harness Features Roadmap json

Inspect the normalized record payload exactly as the atlas UI reads it.

File · wiki/docs/harness-features-backlog/roadmap.mdCluster · wiki

Record JSON

{
  "id": "page:docs-harness-features-backlog-roadmap",
  "_kind": "Page",
  "_file": "wiki/docs/harness-features-backlog/roadmap.md",
  "_cluster": "wiki",
  "attributes": {
    "nodeKind": "Page",
    "sourcePath": "docs/harness-features-backlog/roadmap.md",
    "sourceKind": "repo-docs",
    "title": "Harness Features Roadmap",
    "displayName": "Harness Features Roadmap",
    "slug": "docs/harness-features-backlog/roadmap",
    "articlePath": "wiki/docs/harness-features-backlog/roadmap.md",
    "article": "\n# Harness Features Roadmap\n\n147 gaps organized into 7 milestones. Each milestone has a goal, unlocks specific\ncapabilities, and respects dependency ordering. Gaps within a milestone can be\nworked in parallel unless noted.\n\n---\n\n## M0: Quick Wins and Foundations\n**Goal**: Ship small, no-prerequisite improvements that immediately improve tool\nparity and process validation. No architectural changes -- just better defaults.\n\n**Unlocks**: Tool feature parity for existing agentic tools, process parameter\nvalidation.\n\n| Gap | Title | Effort | Priority |\n|-----|-------|--------|----------|\n| GAP-TOOLS-035 | Grep Output Modes and Context Params | S | Medium |\n| GAP-TOOLS-033 | Runtime Configuration Tool | S | Low |\n| GAP-TOOLS-038 | Ask Tool Interaction Model Alignment | S | Low |\n| GAP-TOOLS-007 | JS/TS REPL Tool | S | Low |\n| GAP-PROC-004 | Process Parameter Schemas and Validation | S | Medium |\n\n**Estimated scope**: 5 gaps, all S effort. ~1 week.\n\n---\n\n## M1: Core Infrastructure\n**Goal**: Build the foundational systems that almost everything else depends on.\nPrompt strata, governance, session model, JSON API, capability routing, and\nstreaming capture. These are the load-bearing walls.\n\n**Unlocks**: Structured prompt composition, policy-based governance, programmatic\nrun management, session-run relationships, harness capability awareness, live\noutput from dispatched tasks.\n\n| Gap | Title | Effort | Priority | Depends On |\n|-----|-------|--------|----------|------------|\n| GAP-PROMPT-001 | Prompt Strata Model | L | Critical | -- |\n| GAP-SEC-001 | Governance Policy Layer | L | Critical | -- |\n| GAP-SESSION-001 | Session-to-Run One-to-Many | L | Critical | -- |\n| GAP-HADAPT-001 | Capability-Based Task Routing | L | Critical | -- |\n| GAP-SUBOBS-001 | Streaming Output Capture | L | Critical | -- |\n| GAP-JSON-001 | JSON API for Run Creation | L | Critical | -- |\n| GAP-JSON-002 | JSON Effect Dispatch Protocol | L | Critical | GAP-JSON-001 |\n| GAP-STATE-008 | Run Health Model | M | High | -- |\n| GAP-REMOTE-007 | Host Contract Layer | L | High | -- |\n| GAP-PAR-009 | Parallel Effect Execution Strategies | M | High | -- |\n| GAP-ROUTE-003 | Effect Result Caching and Dedup | M | Medium | -- |\n\n**Estimated scope**: 11 gaps (7 Critical, 3 High, 1 Medium). ~6-8 weeks.\n\n---\n\n## M2: Observability and Control\n**Goal**: See what's happening during orchestration and control it. Health\nmonitoring, cost tracking, effect cancellation, progress tracking, structured\nstatus views, and the embedded SDK dashboard foundation.\n\n**Unlocks**: Operators can monitor run health in real-time, track costs per\neffect, cancel runaway tasks, see structured status, and get progress updates\nfrom subagents. Breakpoint approval chains work.\n\n| Gap | Title | Effort | Priority | Depends On |\n|-----|-------|--------|----------|------------|\n| GAP-SUBOBS-002 | Subagent Progress Tracking | M | High | M1: SUBOBS-001 |\n| GAP-SUBOBS-003 | Per-Subagent Token and Cost Tracking | M | High | M1: SUBOBS-001 |\n| GAP-TOOLS-030 | Effect Cancellation | M | High | -- |\n| GAP-TOOLS-036 | Bash Background Execution | S | Medium | GAP-TOOLS-030 |\n| GAP-OBS-001 | Run Health Snapshot | M | High | M1: STATE-008 |\n| GAP-OBS-004 | Policy Decision Trail | M | High | M1: SEC-001 |\n| GAP-OBS-NEW-001 | Dashboard Webhook and Alert System | M | High | M1: STATE-008 |\n| GAP-UX-005 | Structured Orchestration Status View | M | High | M1: STATE-008 |\n| GAP-UX-006 | Pending Work Inspector | M | High | -- |\n| GAP-USER-006 | Real-Time Cost Tracking | M | High | GAP-SUBOBS-003, GAP-SESSION-004 |\n| GAP-SESSION-002 | Session State Persistence and History | M | High | M1: SESSION-001 |\n| GAP-SESSION-004 | Session-Level Cost and Budgets | M | High | M1: SESSION-001, GAP-SUBOBS-003 |\n| GAP-JSON-003 | JSON Breakpoint Interaction API | M | High | M1: JSON-001 |\n| GAP-JSON-004 | JSON Session Management API | M | High | M1: JSON-001 |\n| GAP-BRK-001 | Breakpoint Approval Chains | M | High | M1: SEC-001 |\n| GAP-SEC-003 | Permission Request and Denial Hooks | L | High | M1: SEC-001 |\n| GAP-SEC-005 | Approval Posture Model | M | High | M1: SEC-001, GAP-SEC-003 |\n| GAP-PROMPT-002 | Deterministic Capability Projection | M | High | M1: PROMPT-001 |\n| GAP-PROMPT-005 | Continuity Overlays for Resume | M | High | M1: PROMPT-001, M1: STATE-008 |\n| GAP-TOOLS-014 | Programmatic Task CRUD Beyond CLI | M | High | M1: JSON-001 |\n| GAP-TOOLS-018 | Structured Planning Phase | M | High | M0: PROC-004 |\n| GAP-PROMPT-008 | Coding Philosophy Prompt Section | S | High | M1: PROMPT-001 |\n| GAP-PROMPT-009 | Tool Preference and Usage Rules | S | High | M1: PROMPT-001 |\n| GAP-PROMPT-010 | Safety and Reversibility Prompt Framework | S | High | M1: PROMPT-001 |\n| GAP-PROMPT-011 | Output Efficiency Rules | S | Medium | M1: PROMPT-001 |\n| GAP-PROMPT-012 | Git Safety Protocol Prompt Section | S | Medium | M1: PROMPT-001 |\n\n**Estimated scope**: 26 gaps (mostly M effort). ~10-12 weeks.\n\n---\n\n## M3: Multi-Harness Orchestration\n**Goal**: Route effects to the right harness, run tasks in parallel across\nharnesses, isolate work in worktrees, compose processes, and support delegation\npolicies. This is where babysitter becomes a true multi-harness orchestrator.\n\n**Unlocks**: Tasks automatically routed to the best harness for the job. Parallel\nexecution across multiple harnesses. Git worktree isolation. Process chaining.\nModel selection per task. Fallback chains when a harness is unavailable.\n\n| Gap | Title | Effort | Priority | Depends On |\n|-----|-------|--------|----------|------------|\n| GAP-AGENT-001 | Sub-Harness Invocation with Isolation | XL | High | M1: HADAPT-001, M1: SUBOBS-001 |\n| GAP-PAR-001 | Concurrent Effect Execution | L | High | M1: PAR-009 |\n| GAP-PAR-002 | Async Effect Execution | L | High | GAP-PAR-001, M2: SUBOBS-002 |\n| GAP-PAR-003 | Multi-Harness Parallel Dispatch | XL | High | GAP-PAR-001, GAP-PAR-002, M1: HADAPT-001 |\n| GAP-TOOLS-017 | Git Worktree Isolation | L | High | GAP-PAR-001, GAP-AGENT-001 |\n| GAP-HADAPT-002 | Model Selection Per Task | M | High | M1: HADAPT-001 |\n| GAP-HADAPT-004 | Harness Fallback Chains | M | High | M1: HADAPT-001 |\n| GAP-AGENT-005 | Cross-Run Communication | L | High | GAP-AGENT-001 |\n| GAP-AGENT-006 | Cross-Run State Sharing | L | High | M1: SESSION-001 |\n| GAP-AGENT-008 | Harness Selection Policies | M | High | M1: HADAPT-001 |\n| GAP-ROUTE-001 | Smart Effect Routing Engine | XL | High | M1: HADAPT-001 |\n| GAP-PROC-001 | Process Chaining and Pipelines | M | High | M0: PROC-004 |\n| GAP-PROC-002 | Process Nesting and Sub-Process | L | High | GAP-AGENT-001 |\n| GAP-TOOLS-023 | Multi-Step Workflow Composition | L | High | GAP-PROC-001, GAP-PROC-002 |\n| GAP-PERF-001 | Prompt Caching (Ephemeral) | L | Critical | M1: PROMPT-001 |\n| GAP-PERF-002 | Session Compaction | XL | Critical | M1: PROMPT-001 |\n| GAP-PERF-005 | Cache-Aware Prompt Assembly | L | High | M1: PROMPT-001, GAP-PERF-001 |\n| GAP-PERF-008 | Structured Continuity State | L | High | M2: PROMPT-005 |\n| GAP-STATE-003 | Session State Persistence | L | High | M1: SESSION-001 |\n| GAP-STATE-001 | Long-Term Memory Extraction | L | High | M2: SESSION-002 |\n| GAP-USER-001 | Operator Command Layer | L | High | M2: UX-005 |\n| GAP-USER-012 | Plan Mode with Verification | M | High | M2: TOOLS-018 |\n| GAP-TOOLS-008 | Web Search Agentic Tool | M | Medium | M1: HADAPT-001 |\n\n**Estimated scope**: 23 gaps (2 Critical, 19 High, 2 Medium; includes 4 XL). ~12-16 weeks.\n\n---\n\n## M4: MCP and External Integration\n**Goal**: Connect babysitter to the outside world. MCP tool discovery and\ninvocation, channel messaging (Slack/Gmail/Calendar), remote sessions, event\ntriggers, and streaming protocols. Babysitter becomes a platform, not just a\nlocal orchestrator.\n\n**Unlocks**: MCP server tools callable from processes. Slack/Gmail/Calendar\nintegration via MCP channels. Breakpoint approval from Slack. Remote WebSocket\nsessions. Webhook-triggered runs. Daemon mode for always-on orchestration.\n\n| Gap | Title | Effort | Priority | Depends On |\n|-----|-------|--------|----------|------------|\n| GAP-TOOLS-025 | MCP Tool Discovery and Invocation | M | High | M1: HADAPT-001, GAP-REMOTE-006 |\n| GAP-REMOTE-006 | MCP Client Integration | L | Medium | M1: SEC-001 |\n| GAP-MCPC-001 | MCP Channel Inbound Messaging | L | High | GAP-TOOLS-025 |\n| GAP-MCPC-002 | MCP Channel Outbound Messaging | M | High | GAP-MCPC-001 |\n| GAP-MCPC-003 | Channel Permission Relay | L | High | GAP-MCPC-001, M2: BRK-002 |\n| GAP-MCPC-004 | MCP Server Management UI | M | Medium | GAP-TOOLS-025 |\n| GAP-TOOLS-031 | MCP Resource Browsing and Reading | M | Medium | GAP-TOOLS-025 |\n| GAP-TOOLS-032 | MCP Authentication (OAuth) | L | Medium | GAP-TOOLS-025 |\n| GAP-TOOLS-034 | Dynamic Tool Discovery and Search | M | Medium | GAP-TOOLS-025 |\n| GAP-JSON-005 | JSON Event Stream (SSE/WebSocket) | L | High | M1: JSON-001, GAP-REMOTE-008 |\n| GAP-REMOTE-001 | Daemon Mode | XL | High | -- |\n| GAP-REMOTE-003 | Remote Sessions (WebSocket) | XL | High | M1: REMOTE-007, GAP-JSON-005 |\n| GAP-REMOTE-008 | Streaming Orchestration Protocol | L | Medium | M1: REMOTE-007 |\n| GAP-REMOTE-009 | Host-Mediated Interaction | L | Medium | M1: REMOTE-007 |\n| GAP-REMOTE-004 | Cron Triggers and Scheduling | L | Medium | -- |\n| GAP-TOOLS-020 | Scheduled Orchestration Triggers | L | Medium | GAP-REMOTE-001, GAP-REMOTE-004 |\n| GAP-TOOLS-021 | External Event Triggers | L | Medium | GAP-REMOTE-001, GAP-TOOLS-020 |\n| GAP-BRK-002 | Breakpoint Delegation to External Systems | L | High | M2: JSON-003, M2: OBS-NEW-001 |\n| GAP-SEC-002 | Trust Classes for Plugins | L | High | M1: SEC-001 |\n| GAP-SEC-006 | OAuth Integration | L | Medium | M1: SEC-001 |\n| GAP-TOOLS-028 | Sleep/Delay Effect Enhancement | S | Low | GAP-TOOLS-020 |\n\n**Estimated scope**: 21 gaps (includes 2 XL). ~10-14 weeks.\n\n---\n\n## M5: Rich UI and Experience\n**Goal**: Build the Ink/React rendering foundation and all the UI components\nthat make orchestration a first-class visual experience. Structured diffs,\neffect trees, streaming panels, message rendering, embedded SDK dashboard\nwith drill-down.\n\n**Unlocks**: Rich terminal UI for orchestration. Visual effect trees.\nStructured diff rendering. Streaming output panels. Subagent drill-down\nin the embedded SDK dashboard. Operator mode selection.\n\n| Gap | Title | Effort | Priority | Depends On |\n|-----|-------|--------|----------|------------|\n| GAP-UX-001e | Progress and Status Line | S | High | GAP-UX-001 |\n| GAP-UX-001 | Ink/React Terminal Rendering Foundation | L | High | -- |\n| GAP-UX-001a | Effect Tree Visualization | M | High | GAP-UX-001 |\n| GAP-UX-001b | Structured Diff Rendering | M | Medium | GAP-UX-001 |\n| GAP-UX-001c | Permission and Breakpoint Approval UI | M | High | GAP-UX-001, M2: BRK-001 |\n| GAP-UX-001d | Message Type Rendering | L | Medium | GAP-UX-001 |\n| GAP-UX-001f | Streaming Output Panels | L | High | GAP-UX-001, M1: SUBOBS-001 |\n| GAP-SUBOBS-005 | Dashboard Subagent Drill-Down | L | Medium | M2: SUBOBS-002, M2: SUBOBS-003 |\n| GAP-OBS-002 | Phase Timeline Visualization | M | Medium | M2: OBS-001 |\n| GAP-OBS-003 | Prompt Plan Observability | M | Medium | M1: PROMPT-001 |\n| GAP-OBS-005 | Context Introspection | M | Medium | M2: SESSION-004 |\n| GAP-OBS-008 | Agent Progress Summarization | M | Medium | M2: OBS-001 |\n| GAP-OBS-NEW-002 | Dashboard API for External Dashboards | L | Medium | M1: JSON-001 |\n| GAP-UX-007 | Rich Breakpoint Interaction | M | Medium | M2: SEC-005 |\n| GAP-UX-008 | Resume Dashboard | M | Medium | M2: PROMPT-005 |\n| GAP-UX-009 | Failure Triage View | M | Medium | M2: OBS-001 |\n| GAP-UX-010 | Typed Effect Interaction Patterns | M | Medium | M2: JSON-003 |\n| GAP-UX-011 | Command Discoverability | M | Medium | -- |\n| GAP-UX-014 | Operator Mode Selection | M | Medium | M1: PROMPT-001 |\n| GAP-PERF-004 | Streaming Message Rendering | L | High | M1: SUBOBS-001 |\n| GAP-PERF-006 | Incremental Orchestration Streaming | L | Medium | M4: JSON-005 |\n| GAP-TOOLS-029 | Structured Output Tool | M | Medium | GAP-UX-001b, GAP-UX-001d |\n| GAP-TOOLS-037 | Fetch Content Processing | M | Low | -- |\n\n**Estimated scope**: 23 gaps. ~10-14 weeks.\n\n---\n\n## M6: Platform and Ecosystem\n**Goal**: CC plugin compatibility, marketplace protocol, auto-update, trust\nmodel, process versioning, memory systems, and remaining polish. Babysitter\nbecomes a full platform with an ecosystem.\n\n**Unlocks**: CC plugins run on babysitter. Marketplace browsing and install.\nPlugin trust and blocklist. Process versioning and migration. Long-term memory\nconsolidation. Session sharing. Run forking. Full audit export.\n\n| Gap | Title | Effort | Priority | Depends On |\n|-----|-------|--------|----------|------------|\n| GAP-ECO-001 | CC Plugin Compatibility Layer | XL | Critical | GAP-ECO-002, GAP-ECO-003 |\n| GAP-ECO-002 | CC Marketplace Protocol Support | L | High | -- |\n| GAP-ECO-003 | Plugin Trust and Blocklist | M | High | M1: SEC-001 |\n| GAP-ECO-004 | Plugin Auto-Update and Versioning | M | Medium | GAP-ECO-002 |\n| GAP-ECO-005 | Plugin Validation and Diagnostics | S | Medium | GAP-ECO-001 |\n| GAP-AGENT-003 | Process Orchestration with Effect Routing | XL | High | M3: AGENT-001, M3: ROUTE-001 |\n| GAP-AGENT-004 | Built-in Process Templates | L | Medium | M3: HADAPT-002 |\n| GAP-AGENT-007 | Delegation Policy Layer | L | Medium | M1: SEC-001, M1: HADAPT-001 |\n| GAP-HADAPT-003 | Cost-Based Routing Policies | L | High | M1: HADAPT-001, M2: SESSION-004 |\n| GAP-HADAPT-005 | Harness Health and Circuit Breaker | M | Medium | M1: HADAPT-001 |\n| GAP-SUBOBS-004 | Subagent Health and Timeout Monitoring | M | Medium | M1: SUBOBS-001, M2: SUBOBS-003 |\n| GAP-PROC-003 | Process Versioning and Migration | L | Medium | -- |\n| GAP-STATE-002 | Memory Consolidation | L | Medium | M3: STATE-001 |\n| GAP-STATE-006 | Session Rewind and History | L | Medium | M3: STATE-003 |\n| GAP-ROUTE-002 | Effect Priority and Scheduling | M | Medium | M1: PAR-009 |\n| GAP-PAR-005 | Parallel File Operations | M | Medium | M3: PAR-001 |\n| GAP-PAR-006 | Streaming Parallelism | M | Medium | M3: PAR-001 |\n| GAP-PAR-010 | Fork-Join Process Pattern | L | Medium | M3: PAR-003, M3: PROC-002 |\n| GAP-PERF-007 | Aggressive Parallelism | L | Medium | M3: PAR-001 |\n| GAP-RUN-001 | Run Comparison and Diffing | M | Medium | -- |\n| GAP-RUN-002 | Run Archival and Restore | M | Low | -- |\n| GAP-RUN-003 | Run Forking and Branching | L | Medium | GAP-STATE-006 |\n| GAP-SESSION-003 | Session Templates and Presets | M | Medium | M1: SESSION-001 |\n| GAP-SESSION-005 | Session Sharing and Collaboration | L | Low | M2: SESSION-002, M4: REMOTE-003 |\n| GAP-OBS-006 | Analytics and Feature Flags | L | Medium | GAP-ECO-004 |\n| GAP-OBS-007 | Audit Export | M | Medium | M2: OBS-004 |\n| GAP-USER-017 | Plugin Management Integration | M | High | GAP-ECO-001 |\n| GAP-SEC-004 | Sandbox Toggle | M | Medium | M1: SEC-001 |\n| GAP-SEC-007 | Privacy Settings | M | Medium | M1: SEC-001 |\n| GAP-PROMPT-003 | Runtime Personality Overlays | M | Medium | M1: PROMPT-001 |\n| GAP-PROMPT-004 | Prompt Inspection Tooling | M | Medium | M1: PROMPT-001 |\n| GAP-PROMPT-006 | Instructions Loaded Hook | M | Medium | M1: PROMPT-001 |\n| GAP-PROMPT-007 | Context Compression Families | L | Medium | M1: PROMPT-001 |\n| GAP-TOOLS-012 | LSP Integration | L | High | M3: ROUTE-001, M0: PROC-004 |\n| GAP-TOOLS-026 | Structured User Interaction from Effects | M | Medium | M2: JSON-003 |\n| GAP-TOOLS-027 | Skill Discovery from Process Definitions | M | Medium | M1: HADAPT-001 |\n| GAP-PROF-001 | Auto-Configure from User Profile | M | Medium | GAP-ECO-004 |\n| GAP-BRK-003 | Breakpoint Analytics and SLA Tracking | S | Low | M2: OBS-004 |\n\n**Estimated scope**: 38 gaps (includes 2 XL). ~16-20 weeks.\n\n---\n\n## Milestone Summary\n\n| Milestone | Gaps | Goal | Cumulative |\n|-----------|------|------|------------|\n| **M0** Quick Wins | 5 | Tool parity polish + process validation | 5 |\n| **M1** Core Infrastructure | 11 | Foundational systems everything depends on | 16 |\n| **M2** Observability & Control | 26 | See what's happening, control it | 42 |\n| **M3** Multi-Harness Orchestration | 23 | Route, parallelize, compose across harnesses | 65 |\n| **M4** MCP & External Integration | 21 | Connect to outside world: MCP, channels, remote | 86 |\n| **M5** Rich UI & Experience | 23 | Visual orchestration experience | 109 |\n| **M6** Platform & Ecosystem | 38 | Full platform with plugin ecosystem | 147 |\n\n## Dependency Graph (Milestones)\n\n```\nM0 (Quick Wins) ──────────────────────────────────────────┐\n  │                                                        │\n  v                                                        │\nM1 (Core Infrastructure) ─────────────────────────────┐    │\n  │                                                    │    │\n  ├──> M2 (Observability & Control) ──────────┐        │    │\n  │                                            │        │    │\n  ├──> M3 (Multi-Harness Orchestration) <──────┤        │    │\n  │         │                                  │        │    │\n  │         ├──> M4 (MCP & External) <─────────┘        │    │\n  │         │                                           │    │\n  │         └──> M5 (Rich UI) <─────────────────────────┘    │\n  │                   │                                      │\n  └──> M6 (Platform & Ecosystem) <───────────────────────────┘\n```\n\nM3 and M4 can partially overlap (MCP client work can start while multi-harness\nis in progress). M5 can start its foundation (GAP-UX-001) any time after M1.\nM6 is the long tail -- work items can be pulled forward if priorities shift.\n\n## Critical Path\n\nThe fastest path to production-grade multi-harness orchestration:\n\n```\nM0 → M1 (PROMPT-001 + HADAPT-001 + SESSION-001 + JSON-001/002)\n   → M2 (SUBOBS-002/003 + TOOLS-030 + OBS-001)\n   → M3 (AGENT-001 + PAR-001/002/003 + PERF-001/002)\n```\n\nEverything else enhances this core. The critical blockers are:\n1. **GAP-PROMPT-001** (Prompt Strata) -- 19 gaps depend on it\n2. **GAP-HADAPT-001** (Capability Routing) -- 15 gaps depend on it\n3. **GAP-SESSION-001** (Session Model) -- 8 gaps depend on it\n4. **GAP-SEC-001** (Governance) -- 12 gaps depend on it\n5. **GAP-SUBOBS-001** (Streaming Capture) -- 7 gaps depend on it\n6. **GAP-JSON-001** (JSON API) -- 7 gaps depend on it\n",
    "documents": []
  },
  "outgoingEdges": [],
  "incomingEdges": [
    {
      "from": "page:docs-harness-features-backlog",
      "to": "page:docs-harness-features-backlog-roadmap",
      "kind": "contains_page"
    }
  ]
}

Harness Features Roadmap json

Inspect the normalized record payload exactly as the atlas UI reads it.

File · wiki/docs/harness-features-backlog/roadmap.mdCluster · wiki

Record JSON

{
  "id": "page:docs-harness-features-backlog-roadmap",
  "_kind": "Page",
  "_file": "wiki/docs/harness-features-backlog/roadmap.md",
  "_cluster": "wiki",
  "attributes": {
    "nodeKind": "Page",
    "sourcePath": "docs/harness-features-backlog/roadmap.md",
    "sourceKind": "repo-docs",
    "title": "Harness Features Roadmap",
    "displayName": "Harness Features Roadmap",
    "slug": "docs/harness-features-backlog/roadmap",
    "articlePath": "wiki/docs/harness-features-backlog/roadmap.md",
    "article": "\n# Harness Features Roadmap\n\n147 gaps organized into 7 milestones. Each milestone has a goal, unlocks specific\ncapabilities, and respects dependency ordering. Gaps within a milestone can be\nworked in parallel unless noted.\n\n---\n\n## M0: Quick Wins and Foundations\n**Goal**: Ship small, no-prerequisite improvements that immediately improve tool\nparity and process validation. No architectural changes -- just better defaults.\n\n**Unlocks**: Tool feature parity for existing agentic tools, process parameter\nvalidation.\n\n| Gap | Title | Effort | Priority |\n|-----|-------|--------|----------|\n| GAP-TOOLS-035 | Grep Output Modes and Context Params | S | Medium |\n| GAP-TOOLS-033 | Runtime Configuration Tool | S | Low |\n| GAP-TOOLS-038 | Ask Tool Interaction Model Alignment | S | Low |\n| GAP-TOOLS-007 | JS/TS REPL Tool | S | Low |\n| GAP-PROC-004 | Process Parameter Schemas and Validation | S | Medium |\n\n**Estimated scope**: 5 gaps, all S effort. ~1 week.\n\n---\n\n## M1: Core Infrastructure\n**Goal**: Build the foundational systems that almost everything else depends on.\nPrompt strata, governance, session model, JSON API, capability routing, and\nstreaming capture. These are the load-bearing walls.\n\n**Unlocks**: Structured prompt composition, policy-based governance, programmatic\nrun management, session-run relationships, harness capability awareness, live\noutput from dispatched tasks.\n\n| Gap | Title | Effort | Priority | Depends On |\n|-----|-------|--------|----------|------------|\n| GAP-PROMPT-001 | Prompt Strata Model | L | Critical | -- |\n| GAP-SEC-001 | Governance Policy Layer | L | Critical | -- |\n| GAP-SESSION-001 | Session-to-Run One-to-Many | L | Critical | -- |\n| GAP-HADAPT-001 | Capability-Based Task Routing | L | Critical | -- |\n| GAP-SUBOBS-001 | Streaming Output Capture | L | Critical | -- |\n| GAP-JSON-001 | JSON API for Run Creation | L | Critical | -- |\n| GAP-JSON-002 | JSON Effect Dispatch Protocol | L | Critical | GAP-JSON-001 |\n| GAP-STATE-008 | Run Health Model | M | High | -- |\n| GAP-REMOTE-007 | Host Contract Layer | L | High | -- |\n| GAP-PAR-009 | Parallel Effect Execution Strategies | M | High | -- |\n| GAP-ROUTE-003 | Effect Result Caching and Dedup | M | Medium | -- |\n\n**Estimated scope**: 11 gaps (7 Critical, 3 High, 1 Medium). ~6-8 weeks.\n\n---\n\n## M2: Observability and Control\n**Goal**: See what's happening during orchestration and control it. Health\nmonitoring, cost tracking, effect cancellation, progress tracking, structured\nstatus views, and the embedded SDK dashboard foundation.\n\n**Unlocks**: Operators can monitor run health in real-time, track costs per\neffect, cancel runaway tasks, see structured status, and get progress updates\nfrom subagents. Breakpoint approval chains work.\n\n| Gap | Title | Effort | Priority | Depends On |\n|-----|-------|--------|----------|------------|\n| GAP-SUBOBS-002 | Subagent Progress Tracking | M | High | M1: SUBOBS-001 |\n| GAP-SUBOBS-003 | Per-Subagent Token and Cost Tracking | M | High | M1: SUBOBS-001 |\n| GAP-TOOLS-030 | Effect Cancellation | M | High | -- |\n| GAP-TOOLS-036 | Bash Background Execution | S | Medium | GAP-TOOLS-030 |\n| GAP-OBS-001 | Run Health Snapshot | M | High | M1: STATE-008 |\n| GAP-OBS-004 | Policy Decision Trail | M | High | M1: SEC-001 |\n| GAP-OBS-NEW-001 | Dashboard Webhook and Alert System | M | High | M1: STATE-008 |\n| GAP-UX-005 | Structured Orchestration Status View | M | High | M1: STATE-008 |\n| GAP-UX-006 | Pending Work Inspector | M | High | -- |\n| GAP-USER-006 | Real-Time Cost Tracking | M | High | GAP-SUBOBS-003, GAP-SESSION-004 |\n| GAP-SESSION-002 | Session State Persistence and History | M | High | M1: SESSION-001 |\n| GAP-SESSION-004 | Session-Level Cost and Budgets | M | High | M1: SESSION-001, GAP-SUBOBS-003 |\n| GAP-JSON-003 | JSON Breakpoint Interaction API | M | High | M1: JSON-001 |\n| GAP-JSON-004 | JSON Session Management API | M | High | M1: JSON-001 |\n| GAP-BRK-001 | Breakpoint Approval Chains | M | High | M1: SEC-001 |\n| GAP-SEC-003 | Permission Request and Denial Hooks | L | High | M1: SEC-001 |\n| GAP-SEC-005 | Approval Posture Model | M | High | M1: SEC-001, GAP-SEC-003 |\n| GAP-PROMPT-002 | Deterministic Capability Projection | M | High | M1: PROMPT-001 |\n| GAP-PROMPT-005 | Continuity Overlays for Resume | M | High | M1: PROMPT-001, M1: STATE-008 |\n| GAP-TOOLS-014 | Programmatic Task CRUD Beyond CLI | M | High | M1: JSON-001 |\n| GAP-TOOLS-018 | Structured Planning Phase | M | High | M0: PROC-004 |\n| GAP-PROMPT-008 | Coding Philosophy Prompt Section | S | High | M1: PROMPT-001 |\n| GAP-PROMPT-009 | Tool Preference and Usage Rules | S | High | M1: PROMPT-001 |\n| GAP-PROMPT-010 | Safety and Reversibility Prompt Framework | S | High | M1: PROMPT-001 |\n| GAP-PROMPT-011 | Output Efficiency Rules | S | Medium | M1: PROMPT-001 |\n| GAP-PROMPT-012 | Git Safety Protocol Prompt Section | S | Medium | M1: PROMPT-001 |\n\n**Estimated scope**: 26 gaps (mostly M effort). ~10-12 weeks.\n\n---\n\n## M3: Multi-Harness Orchestration\n**Goal**: Route effects to the right harness, run tasks in parallel across\nharnesses, isolate work in worktrees, compose processes, and support delegation\npolicies. This is where babysitter becomes a true multi-harness orchestrator.\n\n**Unlocks**: Tasks automatically routed to the best harness for the job. Parallel\nexecution across multiple harnesses. Git worktree isolation. Process chaining.\nModel selection per task. Fallback chains when a harness is unavailable.\n\n| Gap | Title | Effort | Priority | Depends On |\n|-----|-------|--------|----------|------------|\n| GAP-AGENT-001 | Sub-Harness Invocation with Isolation | XL | High | M1: HADAPT-001, M1: SUBOBS-001 |\n| GAP-PAR-001 | Concurrent Effect Execution | L | High | M1: PAR-009 |\n| GAP-PAR-002 | Async Effect Execution | L | High | GAP-PAR-001, M2: SUBOBS-002 |\n| GAP-PAR-003 | Multi-Harness Parallel Dispatch | XL | High | GAP-PAR-001, GAP-PAR-002, M1: HADAPT-001 |\n| GAP-TOOLS-017 | Git Worktree Isolation | L | High | GAP-PAR-001, GAP-AGENT-001 |\n| GAP-HADAPT-002 | Model Selection Per Task | M | High | M1: HADAPT-001 |\n| GAP-HADAPT-004 | Harness Fallback Chains | M | High | M1: HADAPT-001 |\n| GAP-AGENT-005 | Cross-Run Communication | L | High | GAP-AGENT-001 |\n| GAP-AGENT-006 | Cross-Run State Sharing | L | High | M1: SESSION-001 |\n| GAP-AGENT-008 | Harness Selection Policies | M | High | M1: HADAPT-001 |\n| GAP-ROUTE-001 | Smart Effect Routing Engine | XL | High | M1: HADAPT-001 |\n| GAP-PROC-001 | Process Chaining and Pipelines | M | High | M0: PROC-004 |\n| GAP-PROC-002 | Process Nesting and Sub-Process | L | High | GAP-AGENT-001 |\n| GAP-TOOLS-023 | Multi-Step Workflow Composition | L | High | GAP-PROC-001, GAP-PROC-002 |\n| GAP-PERF-001 | Prompt Caching (Ephemeral) | L | Critical | M1: PROMPT-001 |\n| GAP-PERF-002 | Session Compaction | XL | Critical | M1: PROMPT-001 |\n| GAP-PERF-005 | Cache-Aware Prompt Assembly | L | High | M1: PROMPT-001, GAP-PERF-001 |\n| GAP-PERF-008 | Structured Continuity State | L | High | M2: PROMPT-005 |\n| GAP-STATE-003 | Session State Persistence | L | High | M1: SESSION-001 |\n| GAP-STATE-001 | Long-Term Memory Extraction | L | High | M2: SESSION-002 |\n| GAP-USER-001 | Operator Command Layer | L | High | M2: UX-005 |\n| GAP-USER-012 | Plan Mode with Verification | M | High | M2: TOOLS-018 |\n| GAP-TOOLS-008 | Web Search Agentic Tool | M | Medium | M1: HADAPT-001 |\n\n**Estimated scope**: 23 gaps (2 Critical, 19 High, 2 Medium; includes 4 XL). ~12-16 weeks.\n\n---\n\n## M4: MCP and External Integration\n**Goal**: Connect babysitter to the outside world. MCP tool discovery and\ninvocation, channel messaging (Slack/Gmail/Calendar), remote sessions, event\ntriggers, and streaming protocols. Babysitter becomes a platform, not just a\nlocal orchestrator.\n\n**Unlocks**: MCP server tools callable from processes. Slack/Gmail/Calendar\nintegration via MCP channels. Breakpoint approval from Slack. Remote WebSocket\nsessions. Webhook-triggered runs. Daemon mode for always-on orchestration.\n\n| Gap | Title | Effort | Priority | Depends On |\n|-----|-------|--------|----------|------------|\n| GAP-TOOLS-025 | MCP Tool Discovery and Invocation | M | High | M1: HADAPT-001, GAP-REMOTE-006 |\n| GAP-REMOTE-006 | MCP Client Integration | L | Medium | M1: SEC-001 |\n| GAP-MCPC-001 | MCP Channel Inbound Messaging | L | High | GAP-TOOLS-025 |\n| GAP-MCPC-002 | MCP Channel Outbound Messaging | M | High | GAP-MCPC-001 |\n| GAP-MCPC-003 | Channel Permission Relay | L | High | GAP-MCPC-001, M2: BRK-002 |\n| GAP-MCPC-004 | MCP Server Management UI | M | Medium | GAP-TOOLS-025 |\n| GAP-TOOLS-031 | MCP Resource Browsing and Reading | M | Medium | GAP-TOOLS-025 |\n| GAP-TOOLS-032 | MCP Authentication (OAuth) | L | Medium | GAP-TOOLS-025 |\n| GAP-TOOLS-034 | Dynamic Tool Discovery and Search | M | Medium | GAP-TOOLS-025 |\n| GAP-JSON-005 | JSON Event Stream (SSE/WebSocket) | L | High | M1: JSON-001, GAP-REMOTE-008 |\n| GAP-REMOTE-001 | Daemon Mode | XL | High | -- |\n| GAP-REMOTE-003 | Remote Sessions (WebSocket) | XL | High | M1: REMOTE-007, GAP-JSON-005 |\n| GAP-REMOTE-008 | Streaming Orchestration Protocol | L | Medium | M1: REMOTE-007 |\n| GAP-REMOTE-009 | Host-Mediated Interaction | L | Medium | M1: REMOTE-007 |\n| GAP-REMOTE-004 | Cron Triggers and Scheduling | L | Medium | -- |\n| GAP-TOOLS-020 | Scheduled Orchestration Triggers | L | Medium | GAP-REMOTE-001, GAP-REMOTE-004 |\n| GAP-TOOLS-021 | External Event Triggers | L | Medium | GAP-REMOTE-001, GAP-TOOLS-020 |\n| GAP-BRK-002 | Breakpoint Delegation to External Systems | L | High | M2: JSON-003, M2: OBS-NEW-001 |\n| GAP-SEC-002 | Trust Classes for Plugins | L | High | M1: SEC-001 |\n| GAP-SEC-006 | OAuth Integration | L | Medium | M1: SEC-001 |\n| GAP-TOOLS-028 | Sleep/Delay Effect Enhancement | S | Low | GAP-TOOLS-020 |\n\n**Estimated scope**: 21 gaps (includes 2 XL). ~10-14 weeks.\n\n---\n\n## M5: Rich UI and Experience\n**Goal**: Build the Ink/React rendering foundation and all the UI components\nthat make orchestration a first-class visual experience. Structured diffs,\neffect trees, streaming panels, message rendering, embedded SDK dashboard\nwith drill-down.\n\n**Unlocks**: Rich terminal UI for orchestration. Visual effect trees.\nStructured diff rendering. Streaming output panels. Subagent drill-down\nin the embedded SDK dashboard. Operator mode selection.\n\n| Gap | Title | Effort | Priority | Depends On |\n|-----|-------|--------|----------|------------|\n| GAP-UX-001e | Progress and Status Line | S | High | GAP-UX-001 |\n| GAP-UX-001 | Ink/React Terminal Rendering Foundation | L | High | -- |\n| GAP-UX-001a | Effect Tree Visualization | M | High | GAP-UX-001 |\n| GAP-UX-001b | Structured Diff Rendering | M | Medium | GAP-UX-001 |\n| GAP-UX-001c | Permission and Breakpoint Approval UI | M | High | GAP-UX-001, M2: BRK-001 |\n| GAP-UX-001d | Message Type Rendering | L | Medium | GAP-UX-001 |\n| GAP-UX-001f | Streaming Output Panels | L | High | GAP-UX-001, M1: SUBOBS-001 |\n| GAP-SUBOBS-005 | Dashboard Subagent Drill-Down | L | Medium | M2: SUBOBS-002, M2: SUBOBS-003 |\n| GAP-OBS-002 | Phase Timeline Visualization | M | Medium | M2: OBS-001 |\n| GAP-OBS-003 | Prompt Plan Observability | M | Medium | M1: PROMPT-001 |\n| GAP-OBS-005 | Context Introspection | M | Medium | M2: SESSION-004 |\n| GAP-OBS-008 | Agent Progress Summarization | M | Medium | M2: OBS-001 |\n| GAP-OBS-NEW-002 | Dashboard API for External Dashboards | L | Medium | M1: JSON-001 |\n| GAP-UX-007 | Rich Breakpoint Interaction | M | Medium | M2: SEC-005 |\n| GAP-UX-008 | Resume Dashboard | M | Medium | M2: PROMPT-005 |\n| GAP-UX-009 | Failure Triage View | M | Medium | M2: OBS-001 |\n| GAP-UX-010 | Typed Effect Interaction Patterns | M | Medium | M2: JSON-003 |\n| GAP-UX-011 | Command Discoverability | M | Medium | -- |\n| GAP-UX-014 | Operator Mode Selection | M | Medium | M1: PROMPT-001 |\n| GAP-PERF-004 | Streaming Message Rendering | L | High | M1: SUBOBS-001 |\n| GAP-PERF-006 | Incremental Orchestration Streaming | L | Medium | M4: JSON-005 |\n| GAP-TOOLS-029 | Structured Output Tool | M | Medium | GAP-UX-001b, GAP-UX-001d |\n| GAP-TOOLS-037 | Fetch Content Processing | M | Low | -- |\n\n**Estimated scope**: 23 gaps. ~10-14 weeks.\n\n---\n\n## M6: Platform and Ecosystem\n**Goal**: CC plugin compatibility, marketplace protocol, auto-update, trust\nmodel, process versioning, memory systems, and remaining polish. Babysitter\nbecomes a full platform with an ecosystem.\n\n**Unlocks**: CC plugins run on babysitter. Marketplace browsing and install.\nPlugin trust and blocklist. Process versioning and migration. Long-term memory\nconsolidation. Session sharing. Run forking. Full audit export.\n\n| Gap | Title | Effort | Priority | Depends On |\n|-----|-------|--------|----------|------------|\n| GAP-ECO-001 | CC Plugin Compatibility Layer | XL | Critical | GAP-ECO-002, GAP-ECO-003 |\n| GAP-ECO-002 | CC Marketplace Protocol Support | L | High | -- |\n| GAP-ECO-003 | Plugin Trust and Blocklist | M | High | M1: SEC-001 |\n| GAP-ECO-004 | Plugin Auto-Update and Versioning | M | Medium | GAP-ECO-002 |\n| GAP-ECO-005 | Plugin Validation and Diagnostics | S | Medium | GAP-ECO-001 |\n| GAP-AGENT-003 | Process Orchestration with Effect Routing | XL | High | M3: AGENT-001, M3: ROUTE-001 |\n| GAP-AGENT-004 | Built-in Process Templates | L | Medium | M3: HADAPT-002 |\n| GAP-AGENT-007 | Delegation Policy Layer | L | Medium | M1: SEC-001, M1: HADAPT-001 |\n| GAP-HADAPT-003 | Cost-Based Routing Policies | L | High | M1: HADAPT-001, M2: SESSION-004 |\n| GAP-HADAPT-005 | Harness Health and Circuit Breaker | M | Medium | M1: HADAPT-001 |\n| GAP-SUBOBS-004 | Subagent Health and Timeout Monitoring | M | Medium | M1: SUBOBS-001, M2: SUBOBS-003 |\n| GAP-PROC-003 | Process Versioning and Migration | L | Medium | -- |\n| GAP-STATE-002 | Memory Consolidation | L | Medium | M3: STATE-001 |\n| GAP-STATE-006 | Session Rewind and History | L | Medium | M3: STATE-003 |\n| GAP-ROUTE-002 | Effect Priority and Scheduling | M | Medium | M1: PAR-009 |\n| GAP-PAR-005 | Parallel File Operations | M | Medium | M3: PAR-001 |\n| GAP-PAR-006 | Streaming Parallelism | M | Medium | M3: PAR-001 |\n| GAP-PAR-010 | Fork-Join Process Pattern | L | Medium | M3: PAR-003, M3: PROC-002 |\n| GAP-PERF-007 | Aggressive Parallelism | L | Medium | M3: PAR-001 |\n| GAP-RUN-001 | Run Comparison and Diffing | M | Medium | -- |\n| GAP-RUN-002 | Run Archival and Restore | M | Low | -- |\n| GAP-RUN-003 | Run Forking and Branching | L | Medium | GAP-STATE-006 |\n| GAP-SESSION-003 | Session Templates and Presets | M | Medium | M1: SESSION-001 |\n| GAP-SESSION-005 | Session Sharing and Collaboration | L | Low | M2: SESSION-002, M4: REMOTE-003 |\n| GAP-OBS-006 | Analytics and Feature Flags | L | Medium | GAP-ECO-004 |\n| GAP-OBS-007 | Audit Export | M | Medium | M2: OBS-004 |\n| GAP-USER-017 | Plugin Management Integration | M | High | GAP-ECO-001 |\n| GAP-SEC-004 | Sandbox Toggle | M | Medium | M1: SEC-001 |\n| GAP-SEC-007 | Privacy Settings | M | Medium | M1: SEC-001 |\n| GAP-PROMPT-003 | Runtime Personality Overlays | M | Medium | M1: PROMPT-001 |\n| GAP-PROMPT-004 | Prompt Inspection Tooling | M | Medium | M1: PROMPT-001 |\n| GAP-PROMPT-006 | Instructions Loaded Hook | M | Medium | M1: PROMPT-001 |\n| GAP-PROMPT-007 | Context Compression Families | L | Medium | M1: PROMPT-001 |\n| GAP-TOOLS-012 | LSP Integration | L | High | M3: ROUTE-001, M0: PROC-004 |\n| GAP-TOOLS-026 | Structured User Interaction from Effects | M | Medium | M2: JSON-003 |\n| GAP-TOOLS-027 | Skill Discovery from Process Definitions | M | Medium | M1: HADAPT-001 |\n| GAP-PROF-001 | Auto-Configure from User Profile | M | Medium | GAP-ECO-004 |\n| GAP-BRK-003 | Breakpoint Analytics and SLA Tracking | S | Low | M2: OBS-004 |\n\n**Estimated scope**: 38 gaps (includes 2 XL). ~16-20 weeks.\n\n---\n\n## Milestone Summary\n\n| Milestone | Gaps | Goal | Cumulative |\n|-----------|------|------|------------|\n| **M0** Quick Wins | 5 | Tool parity polish + process validation | 5 |\n| **M1** Core Infrastructure | 11 | Foundational systems everything depends on | 16 |\n| **M2** Observability & Control | 26 | See what's happening, control it | 42 |\n| **M3** Multi-Harness Orchestration | 23 | Route, parallelize, compose across harnesses | 65 |\n| **M4** MCP & External Integration | 21 | Connect to outside world: MCP, channels, remote | 86 |\n| **M5** Rich UI & Experience | 23 | Visual orchestration experience | 109 |\n| **M6** Platform & Ecosystem | 38 | Full platform with plugin ecosystem | 147 |\n\n## Dependency Graph (Milestones)\n\n```\nM0 (Quick Wins) ──────────────────────────────────────────┐\n  │                                                        │\n  v                                                        │\nM1 (Core Infrastructure) ─────────────────────────────┐    │\n  │                                                    │    │\n  ├──> M2 (Observability & Control) ──────────┐        │    │\n  │                                            │        │    │\n  ├──> M3 (Multi-Harness Orchestration) <──────┤        │    │\n  │         │                                  │        │    │\n  │         ├──> M4 (MCP & External) <─────────┘        │    │\n  │         │                                           │    │\n  │         └──> M5 (Rich UI) <─────────────────────────┘    │\n  │                   │                                      │\n  └──> M6 (Platform & Ecosystem) <───────────────────────────┘\n```\n\nM3 and M4 can partially overlap (MCP client work can start while multi-harness\nis in progress). M5 can start its foundation (GAP-UX-001) any time after M1.\nM6 is the long tail -- work items can be pulled forward if priorities shift.\n\n## Critical Path\n\nThe fastest path to production-grade multi-harness orchestration:\n\n```\nM0 → M1 (PROMPT-001 + HADAPT-001 + SESSION-001 + JSON-001/002)\n   → M2 (SUBOBS-002/003 + TOOLS-030 + OBS-001)\n   → M3 (AGENT-001 + PAR-001/002/003 + PERF-001/002)\n```\n\nEverything else enhances this core. The critical blockers are:\n1. **GAP-PROMPT-001** (Prompt Strata) -- 19 gaps depend on it\n2. **GAP-HADAPT-001** (Capability Routing) -- 15 gaps depend on it\n3. **GAP-SESSION-001** (Session Model) -- 8 gaps depend on it\n4. **GAP-SEC-001** (Governance) -- 12 gaps depend on it\n5. **GAP-SUBOBS-001** (Streaming Capture) -- 7 gaps depend on it\n6. **GAP-JSON-001** (JSON API) -- 7 gaps depend on it\n",
    "documents": []
  },
  "outgoingEdges": [],
  "incomingEdges": [
    {
      "from": "page:docs-harness-features-backlog",
      "to": "page:docs-harness-features-backlog-roadmap",
      "kind": "contains_page"
    }
  ]
}