II.
Page JSON
Structured · livepage:docs-user-guide-features-two-loops-architecture
Two-Loops Architecture: Understanding Hybrid Agentic Systems json
Inspect the normalized record payload exactly as the atlas UI reads it.
{
"id": "page:docs-user-guide-features-two-loops-architecture",
"_kind": "Page",
"_file": "wiki/docs/user-guide/features/two-loops-architecture.md",
"_cluster": "wiki",
"attributes": {
"nodeKind": "Page",
"sourcePath": "docs/user-guide/features/two-loops-architecture.md",
"sourceKind": "repo-docs",
"title": "Two-Loops Architecture: Understanding Hybrid Agentic Systems",
"displayName": "Two-Loops Architecture: Understanding Hybrid Agentic Systems",
"slug": "docs/user-guide/features/two-loops-architecture",
"articlePath": "wiki/docs/user-guide/features/two-loops-architecture.md",
"article": "\n# Two-Loops Architecture: Understanding Hybrid Agentic Systems\n\n**Version:** 1.1\n**Last Updated:** 2026-01-26\n**Category:** Feature Guide\n\n---\n\n## TL;DR - What You Need to Know\n\n**Skip this section if you just want to USE babysitter.** This document explains the architecture for those who want to understand WHY babysitter works the way it does, or who are building custom processes.\n\n**The key insight:** Babysitter separates \"what must happen\" (deterministic rules) from \"how to do it\" (AI reasoning). This makes AI workflows reliable and debuggable.\n\n```\n┌─────────────────────────────────────────────────────────────────┐\n│ LOOP 1: The Boss (Orchestrator) │\n│ - \"You must pass tests before deploying\" │\n│ - \"You have max 10 attempts\" │\n│ - \"Stop and ask for approval at this point\" │\n│ │\n│ LOOP 2: The Worker (AI Agent) │\n│ - \"Figure out how to make these tests pass\" │\n│ - \"Find and fix the bugs\" │\n│ - \"Write the code that solves the problem\" │\n└─────────────────────────────────────────────────────────────────┘\n```\n\n**When to read this document:**\n- You're building custom processes\n- You want to understand guardrails and safety\n- You're debugging why a run behaves a certain way\n- You're an architect evaluating babysitter for your team\n\n**When to skip this document:**\n- You just want to run existing processes\n- You're following a tutorial\n- You're a beginner (start with [Quality Convergence](./quality-convergence.md) instead)\n\n---\n\n## Overview\n\nBabysitter implements a **Two-Loops Control Plane** architecture that combines:\n\n1. **Symbolic Orchestration** (Process Engine): Deterministic, code-defined control\n2. **Agentic Harness** (LLM Runtime): Adaptive, AI-powered work execution\n\nThis hybrid approach delivers the best of both worlds: the reliability of deterministic systems with the flexibility of AI reasoning.\n\n### Why Two Loops?\n\n| Single-Loop AI | Two-Loops Hybrid |\n|----------------|------------------|\n| Unpredictable behavior | Bounded, testable autonomy |\n| Hard to debug | Journaled, replayable execution |\n| No safety guarantees | Enforced guardrails and gates |\n| \"It seems done\" | Evidence-driven completion |\n| Context degradation | Fresh context per task |\n\n---\n\n## The Core Building Blocks\n\n### A) Symbolic Orchestrator (Process Engine)\n\nThe orchestrator is the code-defined process that enforces:\n\n| Responsibility | Example |\n|----------------|---------|\n| **Ground truth state** | Run is in \"implementation\" phase |\n| **Progression rules** | Must pass tests before deployment |\n| **Invariants** | Never modify production directly |\n| **Budgets** | Max 10 iterations, 30 min timeout |\n| **Permissions** | Only write to `src/` directory |\n| **Quality gates** | Tests, lint, security must pass |\n| **Journaling** | Every event recorded for replay |\n| **Time travel** | Fork from any point, compare runs |\n\n**The orchestrator owns making execution dependable.**\n\n### B) Agent Harness (LLM Runtime)\n\nThe harness is not \"just an LLM call.\" Modern harnesses include:\n\n| Capability | Description |\n|------------|-------------|\n| Iterative planning | Plan → Execute → Replan |\n| Tool calling | Files, terminal, search, code execution |\n| Command execution | Parse results, handle errors |\n| Incremental fixes | Iterate until checks pass |\n| Structured artifacts | Plans, diffs, summaries |\n| Multi-step reasoning | With constraints |\n| Sub-agents | Delegation inside the harness |\n\n**The harness owns solving fuzzy parts and adapting to feedback.**\n\n### C) Symbolic Logic Surfaces (Shared Capabilities)\n\nSymbolic logic appears in **multiple places**, all consistent:\n\n1. **Inside orchestrator** (stage transitions, invariants, gates, budgets)\n2. **As symbolic tools** callable by the harness (policy checks, gate evaluation)\n3. **As symbolic tasks** callable by orchestration (validators, analyzers)\n\n```javascript\n// Symbolic logic as orchestrator rule (using loop for retry)\nfor (let iteration = 0; iteration < maxIterations; iteration++) {\n const impl = await ctx.task(implementTask, { feature });\n const testResults = await ctx.task(runTestsTask, { impl });\n\n if (testResults.passed) break; // Success - exit loop\n // Loop continues with feedback from failed tests\n}\n\n// Symbolic logic as tool callable by harness\nconst allowed = await ctx.task(policyCheckTask, {\n action: 'modifyFile',\n path: '/etc/config.json'\n});\n\n// Symbolic logic as validation task\nconst gateResult = await ctx.task(securityGateTask, {\n files: impl.filesModified\n});\n```\n\n---\n\n## The Two Loops in Detail\n\n### Loop 1: Orchestration Loop (Symbolic)\n\nA process stepper that progresses a run through explicit stages.\n\n**Typical Cycle:**\n\n```\n1. Reconstruct \"what is true\" from the journal\n2. Determine what stage the run is in\n3. Check gates/constraints/budgets\n4. Choose the next allowed transition\n5. Emit the next effect (or wait)\n6. Record results back into the journal\n```\n\n**This loop is about:** control, safety, repeatability, traceability.\n\n### Loop 2: Agentic Loop (Harness)\n\nA tool-using reasoning loop that iterates until reaching a local objective.\n\n**Typical Cycle:**\n\n```\n1. Read current objective + constraints\n2. Decide what evidence is needed\n3. Call tools, inspect results\n4. Update plan or actions\n5. Produce an output (patch, plan, answer, report)\n```\n\n**This loop is about:** solving the task when information is incomplete.\n\n---\n\n## What Goes Where?\n\nThe design challenge is deciding **which execution decisions are deterministic/symbolic** and **which are adaptive/agentic**.\n\n### Put in Symbolic Logic When...\n\nThese decisions must be **stable, enforceable, and auditable**:\n\n| Decision Type | Examples |\n|---------------|----------|\n| **Safety/permissions** | What actions are allowed |\n| **Budgets/limits** | Time, cost, tool call limits |\n| **State transitions** | What stage you're in |\n| **Concurrency rules** | What can run in parallel |\n| **Retry/timeout policy** | What happens on failure |\n| **Idempotency** | Avoid double execution |\n| **Quality gates** | What proof is required |\n| **Compliance/audit** | Logging requirements |\n\n### Put in Agent Harness When...\n\nThese decisions benefit from **flexible reasoning**:\n\n| Decision Type | Examples |\n|---------------|----------|\n| **Ambiguous instructions** | \"Make it better\" |\n| **Uncertain approach** | Multiple valid solutions |\n| **Search/discovery** | Find relevant files |\n| **Drafting** | Code, docs, analyses |\n| **Debugging** | Iterate against tool results |\n| **Summarizing** | Compress evidence |\n| **Proposing** | Candidate solutions |\n\n### The Mixed Zone\n\nMany tasks are mixed. The pattern is:\n- **Symbolic logic defines the envelope** (constraints + gates + budgets)\n- **Harness explores inside that envelope** (implements, debugs, refines)\n- **Both can invoke symbolic rules** (nothing is guesswork)\n\n```javascript\n// Mixed: Harness works, orchestrator validates (loop-based retry)\nlet securityPassed = false;\nfor (let iteration = 0; iteration < maxIterations && !securityPassed; iteration++) {\n const impl = await ctx.task(implementTask, {\n feature,\n constraints: {\n allowedPaths: ['src/**'],\n forbiddenPatterns: ['eval(', 'exec('],\n maxFilesModified: 10\n },\n // Pass previous feedback on retry iterations\n feedback: iteration > 0 ? lastSecurityResult.recommendations : null\n });\n\n // Orchestrator enforces gate\n const securityResult = await ctx.task(securityGateTask, { impl });\n securityPassed = securityResult.passed;\n lastSecurityResult = securityResult;\n}\n```\n\n---\n\n## The Four Guardrail Layers\n\nGuardrails are a **layered approach**, not a single feature.\n\n### Layer A: Capability Guardrails (What's Possible)\n\nDefine what tools and actions exist.\n\n```javascript\nconst capabilityConfig = {\n allowedTools: ['read', 'write', 'shell', 'search'],\n pathRestrictions: ['src/**', 'tests/**'],\n networkAccess: 'none',\n permissions: 'read-write',\n destructiveActions: 'require-confirmation'\n};\n```\n\n### Layer B: Budget Guardrails (How Far)\n\nPrevent runaway execution.\n\n```javascript\nconst budgetConfig = {\n maxToolCalls: 100,\n maxWallClockMinutes: 30,\n maxTokenSpend: 50000,\n maxIterations: 10,\n rateLimits: { apiCalls: '10/minute' }\n};\n```\n\n### Layer C: Policy Guardrails (What's Allowed)\n\nRules that define acceptable behavior.\n\n```javascript\nconst policyConfig = {\n rules: [\n 'never exfiltrate secrets',\n 'never modify production directly',\n 'always run tests before merge',\n 'security scans required for dependencies'\n ]\n};\n```\n\n### Layer D: Behavioral Guardrails (How Decisions Are Made)\n\nStructural consistency in outputs.\n\n```javascript\nconst behavioralConfig = {\n requireStructuredOutputs: true,\n requireEvidenceCitations: true,\n requireUncertaintyDeclaration: true,\n outputSchemas: { /* JSON schemas */ }\n};\n```\n\n---\n\n## Quality Gates: Turning Agentic Work into Reliable Outcomes\n\nQuality gates convert \"it seems done\" into \"it is done.\"\n\n### The Evidence-Driven Pattern\n\nEach phase must end with:\n\n| Component | Description |\n|-----------|-------------|\n| **Artifact** | The work product (patch, doc, config, report) |\n| **Evidence** | Proof it meets requirements (logs, test output, checks) |\n\n**If you don't have evidence, you don't have completion.**\n\n### Common Gated Steps\n\n| Gate Type | What It Validates |\n|-----------|-------------------|\n| Unit tests | Individual functions work |\n| Integration tests | Components work together |\n| System tests | End-to-end behavior |\n| Acceptance tests | User requirements met |\n| Lint/formatting | Code style compliance |\n| Type checking | Type safety |\n| Static analysis | Potential bugs |\n| Security scans | Vulnerabilities |\n| Reproducibility | Clean run in fresh env |\n| Diff review | No forbidden file changes |\n| Performance | Meets thresholds |\n\n### Where Gates Live (Consistent Everywhere)\n\n```javascript\n// In orchestrator: loop-based retry for gate failures\nlet gateResults = { passed: false };\nfor (let i = 0; i < maxIterations && !gateResults.passed; i++) {\n const impl = await ctx.task(implementTask, { feature, feedback: gateResults.failures });\n gateResults = await ctx.task(runGatesTask, { impl });\n}\n\n// As symbolic tool: harness pre-checks during work\nconst gateResult = await checkGate(impl);\nif (!gateResult.passed) {\n // Harness can immediately attempt repair\n await repairIssues(gateResult.failures);\n}\n\n// As symbolic task: verify evidence objectively\nconst evidence = await ctx.task(gateValidatorTask, { impl });\n```\n\n### Human Approval Gates\n\nFor high-impact steps, include explicit checkpoints:\n\n```javascript\n// Plan approval before execution\nawait ctx.breakpoint({\n question: 'Review the plan. Approve to proceed with implementation?',\n title: 'Plan Approval',\n context: { /* ... */ }\n});\n\n// Diff approval before merge\nawait ctx.breakpoint({\n question: `Review the diff (${diff.linesChanged} lines). Approve to merge?`,\n title: 'Merge Approval'\n});\n\n// Deployment approval\nawait ctx.breakpoint({\n question: 'Quality: 92/100. Deploy to production?',\n title: 'Production Deployment'\n});\n```\n\n---\n\n## The Journal: Making Execution Testable\n\nA journaled control plane turns agentic behavior into something you can:\n\n| Capability | Value |\n|------------|-------|\n| **Replay** | Debug by re-running |\n| **Inspect** | See exactly what happened |\n| **Diff** | Compare across forks |\n| **Audit** | Compliance evidence |\n| **Analyze** | Failure pattern detection |\n\n### What's Journaled\n\n| Event Type | Example |\n|------------|---------|\n| **Inputs/signals** | Initial requirements |\n| **Stage transitions** | \"planning\" → \"implementation\" |\n| **Requested actions** | `writeFile('/src/auth.ts', ...)` |\n| **Results** | Action succeeded, 42 lines written |\n| **Artifacts** | `plan.md`, `implementation.patch` |\n| **Evidence** | Test results, gate outcomes |\n| **Gate outcomes** | Security: PASS, Tests: PASS |\n| **Approvals** | User approved at breakpoint |\n\n---\n\n## Prompt Quality is Determinism Engineering\n\nIn a two-loop system, prompts are **configuration for the harness**.\n\n### Why Prompt Quality Matters\n\nBetter prompts reduce:\n- Output variance\n- Tool misuse\n- Hidden assumptions\n- Inconsistent formatting\n- Unpredictable branching\n\nBetter prompts improve:\n- Repeatability\n- Debuggability\n- Fork comparisons\n- Safe automation\n\n### The Real Goal: Structural Consistency\n\nYou don't need identical wording. You need consistent:\n- Decision formats\n- Priorities\n- Stop/ask conditions\n- Evidence standards\n\n### Prompt Versioning\n\nTreat harness prompts like engineering surfaces:\n\n```javascript\nconst promptVersion = '2.1.0';\n\nconst implementerPrompt = {\n version: promptVersion,\n role: 'senior software engineer',\n task: 'Implement feature according to specification',\n constraints: [\n 'Follow existing code patterns',\n 'Write tests for all public functions',\n 'Document complex logic',\n 'Ask for clarification if requirements are ambiguous'\n ],\n outputFormat: {\n type: 'object',\n required: ['filesModified', 'summary', 'confidence']\n }\n};\n```\n\n---\n\n## Common Failure Modes and Fixes\n\n### 1. Everything is Agentic\n\n**Symptom:** Unpredictable behavior, hard to debug, inconsistent safety.\n\n**Fix:** Move gates, budgets, and invariants into symbolic orchestration.\n\n### 2. Everything is Symbolic\n\n**Symptom:** Brittle workflows, poor adaptation, high maintenance.\n\n**Fix:** Delegate fuzzy decisions and exploration to the harness.\n\n### 3. Hidden State\n\n**Symptom:** The harness \"remembers\" things the system never logged.\n\n**Fix:** Journal what matters; the system's truth must be reconstructible.\n\n### 4. Wide Tool Surface\n\n**Symptom:** Tool confusion, increased risk, unpredictable results.\n\n**Fix:** Keep tools small, stable, and well-described.\n\n### 5. No Explicit Evidence Requirements\n\n**Symptom:** \"Done\" claims without proof.\n\n**Fix:** Define completion as artifact + evidence, enforced by gates.\n\n---\n\n## The Doctrine\n\nIf you define only a few principles, make them these:\n\n1. **The orchestrator owns** run progression, journaling, and phase boundaries\n2. **Symbolic logic owns** constraints, permissions, budgets, and gates\n3. **The harness owns** adaptive work inside constraints\n4. **Guardrails are enforced** by symbolic checks, not informal intentions\n5. **Quality is evidence-driven**, not assertion-driven\n6. **Prompts are versioned** control surfaces for harness behavior\n7. **The journal is the source** of truth for replay, audit, and forking\n\n---\n\n## Getting Started\n\nIf you're building from scratch:\n\n1. **Define phases** (a small symbolic process)\n2. **Define effects/tools** available in each phase\n3. **Add budgets and permissions**\n4. **Decide quality gates per phase**\n5. **Add a harness** that can do real work\n6. **Journal everything** needed for replay and audit\n7. **Add fork + time travel** as first-class operations\n\n**If you do only one thing:** make completion require evidence.\n\n---\n\n## Process Library Examples\n\n### Spec-Driven Development\n\n`methodologies/spec-driven-development.js`\n\nImplements the full two-loops pattern:\n- **Symbolic:** Constitution validation, plan-constitution alignment, consistency analysis\n- **Agentic:** Specification writing, planning, implementation\n- **Gates:** Every phase has approval breakpoints\n\n### V-Model\n\n`methodologies/v-model.js`\n\nHeavy on symbolic verification:\n- **Four test levels** designed before implementation\n- **Traceability matrix** ensures complete coverage\n- **Safety levels** adjust rigor\n\n### GSD Iterative Convergence\n\n`gsd/iterative-convergence.js`\n\nFeedback-driven quality loop:\n- **Implement → Score → Feedback → Repeat**\n- **Breakpoints** at quality thresholds\n- **Plateau detection** for early exit\n\n---\n\n## Related Documentation\n\n- [Quality Convergence](./quality-convergence.md) - Five quality gate types and 90-score pattern\n- [Best Practices](./best-practices.md) - Workflow design and guardrail patterns\n- [Process Definitions](./process-definitions.md) - Creating your own processes\n- [Journal System](./journal-system.md) - Event sourcing and replay\n- [Breakpoints](./breakpoints.md) - Human-in-the-loop approval\n\n---\n\n## Summary\n\nThe Two-Loops architecture enables bounded, testable autonomy:\n\n- **Orchestration Loop** provides control, safety, and traceability\n- **Agentic Loop** provides capability, adaptation, and problem-solving\n- **Quality Gates** turn \"seems done\" into \"is done\" with evidence\n- **Guardrails** enforce rules at capability, budget, policy, and behavioral levels\n- **Journaling** makes everything replayable and auditable\n\nWhen done well, you get **autonomy that is bounded, testable, and steadily improvable**.\n\n---\n\n## SDK API Quick Reference\n\nThe complete list of SDK intrinsics (functions available on `ctx`):\n\n| Function | Purpose | Example |\n|----------|---------|---------|\n| `ctx.task(taskDef, args)` | Execute a task | `await ctx.task(buildTask, { target: 'dist' })` |\n| `ctx.breakpoint(opts)` | Pause for human approval | `await ctx.breakpoint({ question: 'Deploy?', title: 'Approval' })` |\n| `ctx.parallel.all([...])` | Run tasks in parallel | `await ctx.parallel.all([() => ctx.task(a), () => ctx.task(b)])` |\n| `ctx.parallel.map(arr, fn)` | Map over array in parallel | `await ctx.parallel.map(files, f => ctx.task(lint, { file: f }))` |\n| `ctx.sleepUntil(iso8601)` | Pause until a specific time | `await ctx.sleepUntil('2026-01-27T10:00:00Z')` |\n| `ctx.log(msg, data?)` | Log message to journal | `ctx.log('Quality score', { score: 85 })` |\n| `ctx.now()` | Get current time (deterministic) | `const ts = ctx.now().getTime()` |\n| `ctx.runId` | Current run identifier | `const id = ctx.runId` |\n\n**Important:** There is NO `ctx.retry()`. Use loops for retry logic:\n\n```javascript\n// Correct: Loop-based retry\nfor (let i = 0; i < maxIterations && !passed; i++) {\n const result = await ctx.task(implementTask, { feedback });\n passed = result.testsPass;\n feedback = result.errors;\n}\n```\n\n---\n\n## What To Do Next\n\nBased on your role, here's your next step:\n\n| If you are... | Do this next |\n|---------------|--------------|\n| **Beginner** | Read [Quality Convergence](./quality-convergence.md) for the core iteration pattern |\n| **Building processes** | Study [Best Practices](./best-practices.md) for workflow design |\n| **Debugging a run** | Check [Journal System](./journal-system.md) to understand event sourcing |\n| **Adding approvals** | See [Breakpoints](./breakpoints.md) for human-in-the-loop patterns |\n| **Evaluating for team** | Review the Four Guardrail Layers section above |\n",
"documents": []
},
"outgoingEdges": [],
"incomingEdges": [
{
"from": "page:docs-user-guide-features",
"to": "page:docs-user-guide-features-two-loops-architecture",
"kind": "contains_page"
}
]
}