docs/user-guide/features/two-loops-architecture
Two-Loops Architecture: Understanding Hybrid Agentic Systems reference
Continue reading
Nearby pages in the same section.
Two-Loops Architecture: Understanding Hybrid Agentic Systems
**Version:** 1.1 **Last Updated:** 2026-01-26 **Category:** Feature Guide
---
TL;DR - What You Need to Know
**Skip this section if you just want to USE babysitter.** This document explains the architecture for those who want to understand WHY babysitter works the way it does, or who are building custom processes.
**The key insight:** Babysitter separates "what must happen" (deterministic rules) from "how to do it" (AI reasoning). This makes AI workflows reliable and debuggable.
┌─────────────────────────────────────────────────────────────────┐
│ LOOP 1: The Boss (Orchestrator) │
│ - "You must pass tests before deploying" │
│ - "You have max 10 attempts" │
│ - "Stop and ask for approval at this point" │
│ │
│ LOOP 2: The Worker (AI Agent) │
│ - "Figure out how to make these tests pass" │
│ - "Find and fix the bugs" │
│ - "Write the code that solves the problem" │
└─────────────────────────────────────────────────────────────────┘**When to read this document:**
- You're building custom processes
- You want to understand guardrails and safety
- You're debugging why a run behaves a certain way
- You're an architect evaluating babysitter for your team
**When to skip this document:**
- You just want to run existing processes
- You're following a tutorial
- You're a beginner (start with Quality Convergence instead)
---
Overview
Babysitter implements a **Two-Loops Control Plane** architecture that combines:
1. **Symbolic Orchestration** (Process Engine): Deterministic, code-defined control 2. **Agentic Harness** (LLM Runtime): Adaptive, AI-powered work execution
This hybrid approach delivers the best of both worlds: the reliability of deterministic systems with the flexibility of AI reasoning.
Why Two Loops?
| Single-Loop AI | Two-Loops Hybrid |
|---|---|
| Unpredictable behavior | Bounded, testable autonomy |
| Hard to debug | Journaled, replayable execution |
| No safety guarantees | Enforced guardrails and gates |
| "It seems done" | Evidence-driven completion |
| Context degradation | Fresh context per task |
---
The Core Building Blocks
A) Symbolic Orchestrator (Process Engine)
The orchestrator is the code-defined process that enforces:
| Responsibility | Example |
|---|---|
| **Ground truth state** | Run is in "implementation" phase |
| **Progression rules** | Must pass tests before deployment |
| **Invariants** | Never modify production directly |
| **Budgets** | Max 10 iterations, 30 min timeout |
| **Permissions** | Only write to src/ directory |
| **Quality gates** | Tests, lint, security must pass |
| **Journaling** | Every event recorded for replay |
| **Time travel** | Fork from any point, compare runs |
**The orchestrator owns making execution dependable.**
B) Agent Harness (LLM Runtime)
The harness is not "just an LLM call." Modern harnesses include:
| Capability | Description |
|---|---|
| Iterative planning | Plan → Execute → Replan |
| Tool calling | Files, terminal, search, code execution |
| Command execution | Parse results, handle errors |
| Incremental fixes | Iterate until checks pass |
| Structured artifacts | Plans, diffs, summaries |
| Multi-step reasoning | With constraints |
| Sub-agents | Delegation inside the harness |
**The harness owns solving fuzzy parts and adapting to feedback.**
C) Symbolic Logic Surfaces (Shared Capabilities)
Symbolic logic appears in **multiple places**, all consistent:
1. **Inside orchestrator** (stage transitions, invariants, gates, budgets) 2. **As symbolic tools** callable by the harness (policy checks, gate evaluation) 3. **As symbolic tasks** callable by orchestration (validators, analyzers)
// Symbolic logic as orchestrator rule (using loop for retry)
for (let iteration = 0; iteration < maxIterations; iteration++) {
const impl = await ctx.task(implementTask, { feature });
const testResults = await ctx.task(runTestsTask, { impl });
if (testResults.passed) break; // Success - exit loop
// Loop continues with feedback from failed tests
}
// Symbolic logic as tool callable by harness
const allowed = await ctx.task(policyCheckTask, {
action: 'modifyFile',
path: '/etc/config.json'
});
// Symbolic logic as validation task
const gateResult = await ctx.task(securityGateTask, {
files: impl.filesModified
});---
The Two Loops in Detail
Loop 1: Orchestration Loop (Symbolic)
A process stepper that progresses a run through explicit stages.
**Typical Cycle:**
1. Reconstruct "what is true" from the journal
2. Determine what stage the run is in
3. Check gates/constraints/budgets
4. Choose the next allowed transition
5. Emit the next effect (or wait)
6. Record results back into the journal**This loop is about:** control, safety, repeatability, traceability.
Loop 2: Agentic Loop (Harness)
A tool-using reasoning loop that iterates until reaching a local objective.
**Typical Cycle:**
1. Read current objective + constraints
2. Decide what evidence is needed
3. Call tools, inspect results
4. Update plan or actions
5. Produce an output (patch, plan, answer, report)**This loop is about:** solving the task when information is incomplete.
---
What Goes Where?
The design challenge is deciding **which execution decisions are deterministic/symbolic** and **which are adaptive/agentic**.
Put in Symbolic Logic When...
These decisions must be **stable, enforceable, and auditable**:
| Decision Type | Examples |
|---|---|
| **Safety/permissions** | What actions are allowed |
| **Budgets/limits** | Time, cost, tool call limits |
| **State transitions** | What stage you're in |
| **Concurrency rules** | What can run in parallel |
| **Retry/timeout policy** | What happens on failure |
| **Idempotency** | Avoid double execution |
| **Quality gates** | What proof is required |
| **Compliance/audit** | Logging requirements |
Put in Agent Harness When...
These decisions benefit from **flexible reasoning**:
| Decision Type | Examples |
|---|---|
| **Ambiguous instructions** | "Make it better" |
| **Uncertain approach** | Multiple valid solutions |
| **Search/discovery** | Find relevant files |
| **Drafting** | Code, docs, analyses |
| **Debugging** | Iterate against tool results |
| **Summarizing** | Compress evidence |
| **Proposing** | Candidate solutions |
The Mixed Zone
Many tasks are mixed. The pattern is:
- **Symbolic logic defines the envelope** (constraints + gates + budgets)
- **Harness explores inside that envelope** (implements, debugs, refines)
- **Both can invoke symbolic rules** (nothing is guesswork)
// Mixed: Harness works, orchestrator validates (loop-based retry)
let securityPassed = false;
for (let iteration = 0; iteration < maxIterations && !securityPassed; iteration++) {
const impl = await ctx.task(implementTask, {
feature,
constraints: {
allowedPaths: ['src/**'],
forbiddenPatterns: ['eval(', 'exec('],
maxFilesModified: 10
},
// Pass previous feedback on retry iterations
feedback: iteration > 0 ? lastSecurityResult.recommendations : null
});
// Orchestrator enforces gate
const securityResult = await ctx.task(securityGateTask, { impl });
securityPassed = securityResult.passed;
lastSecurityResult = securityResult;
}---
The Four Guardrail Layers
Guardrails are a **layered approach**, not a single feature.
Layer A: Capability Guardrails (What's Possible)
Define what tools and actions exist.
const capabilityConfig = {
allowedTools: ['read', 'write', 'shell', 'search'],
pathRestrictions: ['src/**', 'tests/**'],
networkAccess: 'none',
permissions: 'read-write',
destructiveActions: 'require-confirmation'
};Layer B: Budget Guardrails (How Far)
Prevent runaway execution.
const budgetConfig = {
maxToolCalls: 100,
maxWallClockMinutes: 30,
maxTokenSpend: 50000,
maxIterations: 10,
rateLimits: { apiCalls: '10/minute' }
};Layer C: Policy Guardrails (What's Allowed)
Rules that define acceptable behavior.
const policyConfig = {
rules: [
'never exfiltrate secrets',
'never modify production directly',
'always run tests before merge',
'security scans required for dependencies'
]
};Layer D: Behavioral Guardrails (How Decisions Are Made)
Structural consistency in outputs.
const behavioralConfig = {
requireStructuredOutputs: true,
requireEvidenceCitations: true,
requireUncertaintyDeclaration: true,
outputSchemas: { /* JSON schemas */ }
};---
Quality Gates: Turning Agentic Work into Reliable Outcomes
Quality gates convert "it seems done" into "it is done."
The Evidence-Driven Pattern
Each phase must end with:
| Component | Description |
|---|---|
| **Artifact** | The work product (patch, doc, config, report) |
| **Evidence** | Proof it meets requirements (logs, test output, checks) |
**If you don't have evidence, you don't have completion.**
Common Gated Steps
| Gate Type | What It Validates |
|---|---|
| Unit tests | Individual functions work |
| Integration tests | Components work together |
| System tests | End-to-end behavior |
| Acceptance tests | User requirements met |
| Lint/formatting | Code style compliance |
| Type checking | Type safety |
| Static analysis | Potential bugs |
| Security scans | Vulnerabilities |
| Reproducibility | Clean run in fresh env |
| Diff review | No forbidden file changes |
| Performance | Meets thresholds |
Where Gates Live (Consistent Everywhere)
// In orchestrator: loop-based retry for gate failures
let gateResults = { passed: false };
for (let i = 0; i < maxIterations && !gateResults.passed; i++) {
const impl = await ctx.task(implementTask, { feature, feedback: gateResults.failures });
gateResults = await ctx.task(runGatesTask, { impl });
}
// As symbolic tool: harness pre-checks during work
const gateResult = await checkGate(impl);
if (!gateResult.passed) {
// Harness can immediately attempt repair
await repairIssues(gateResult.failures);
}
// As symbolic task: verify evidence objectively
const evidence = await ctx.task(gateValidatorTask, { impl });Human Approval Gates
For high-impact steps, include explicit checkpoints:
// Plan approval before execution
await ctx.breakpoint({
question: 'Review the plan. Approve to proceed with implementation?',
title: 'Plan Approval',
context: { /* ... */ }
});
// Diff approval before merge
await ctx.breakpoint({
question: `Review the diff (${diff.linesChanged} lines). Approve to merge?`,
title: 'Merge Approval'
});
// Deployment approval
await ctx.breakpoint({
question: 'Quality: 92/100. Deploy to production?',
title: 'Production Deployment'
});---
The Journal: Making Execution Testable
A journaled control plane turns agentic behavior into something you can:
| Capability | Value |
|---|---|
| **Replay** | Debug by re-running |
| **Inspect** | See exactly what happened |
| **Diff** | Compare across forks |
| **Audit** | Compliance evidence |
| **Analyze** | Failure pattern detection |
What's Journaled
| Event Type | Example |
|---|---|
| **Inputs/signals** | Initial requirements |
| **Stage transitions** | "planning" → "implementation" |
| **Requested actions** | writeFile('/src/auth.ts', ...) |
| **Results** | Action succeeded, 42 lines written |
| **Artifacts** | plan.md, implementation.patch |
| **Evidence** | Test results, gate outcomes |
| **Gate outcomes** | Security: PASS, Tests: PASS |
| **Approvals** | User approved at breakpoint |
---
Prompt Quality is Determinism Engineering
In a two-loop system, prompts are **configuration for the harness**.
Why Prompt Quality Matters
Better prompts reduce:
- Output variance
- Tool misuse
- Hidden assumptions
- Inconsistent formatting
- Unpredictable branching
Better prompts improve:
- Repeatability
- Debuggability
- Fork comparisons
- Safe automation
The Real Goal: Structural Consistency
You don't need identical wording. You need consistent:
- Decision formats
- Priorities
- Stop/ask conditions
- Evidence standards
Prompt Versioning
Treat harness prompts like engineering surfaces:
const promptVersion = '2.1.0';
const implementerPrompt = {
version: promptVersion,
role: 'senior software engineer',
task: 'Implement feature according to specification',
constraints: [
'Follow existing code patterns',
'Write tests for all public functions',
'Document complex logic',
'Ask for clarification if requirements are ambiguous'
],
outputFormat: {
type: 'object',
required: ['filesModified', 'summary', 'confidence']
}
};---
Common Failure Modes and Fixes
1. Everything is Agentic
**Symptom:** Unpredictable behavior, hard to debug, inconsistent safety.
**Fix:** Move gates, budgets, and invariants into symbolic orchestration.
2. Everything is Symbolic
**Symptom:** Brittle workflows, poor adaptation, high maintenance.
**Fix:** Delegate fuzzy decisions and exploration to the harness.
3. Hidden State
**Symptom:** The harness "remembers" things the system never logged.
**Fix:** Journal what matters; the system's truth must be reconstructible.
4. Wide Tool Surface
**Symptom:** Tool confusion, increased risk, unpredictable results.
**Fix:** Keep tools small, stable, and well-described.
5. No Explicit Evidence Requirements
**Symptom:** "Done" claims without proof.
**Fix:** Define completion as artifact + evidence, enforced by gates.
---
The Doctrine
If you define only a few principles, make them these:
1. **The orchestrator owns** run progression, journaling, and phase boundaries 2. **Symbolic logic owns** constraints, permissions, budgets, and gates 3. **The harness owns** adaptive work inside constraints 4. **Guardrails are enforced** by symbolic checks, not informal intentions 5. **Quality is evidence-driven**, not assertion-driven 6. **Prompts are versioned** control surfaces for harness behavior 7. **The journal is the source** of truth for replay, audit, and forking
---
Getting Started
If you're building from scratch:
1. **Define phases** (a small symbolic process) 2. **Define effects/tools** available in each phase 3. **Add budgets and permissions** 4. **Decide quality gates per phase** 5. **Add a harness** that can do real work 6. **Journal everything** needed for replay and audit 7. **Add fork + time travel** as first-class operations
**If you do only one thing:** make completion require evidence.
---
Process Library Examples
Spec-Driven Development
methodologies/spec-driven-development.js
Implements the full two-loops pattern:
- **Symbolic:** Constitution validation, plan-constitution alignment, consistency analysis
- **Agentic:** Specification writing, planning, implementation
- **Gates:** Every phase has approval breakpoints
V-Model
methodologies/v-model.js
Heavy on symbolic verification:
- **Four test levels** designed before implementation
- **Traceability matrix** ensures complete coverage
- **Safety levels** adjust rigor
GSD Iterative Convergence
gsd/iterative-convergence.js
Feedback-driven quality loop:
- **Implement → Score → Feedback → Repeat**
- **Breakpoints** at quality thresholds
- **Plateau detection** for early exit
---
Related Documentation
- Quality Convergence - Five quality gate types and 90-score pattern
- Best Practices - Workflow design and guardrail patterns
- Process Definitions - Creating your own processes
- Journal System - Event sourcing and replay
- Breakpoints - Human-in-the-loop approval
---
Summary
The Two-Loops architecture enables bounded, testable autonomy:
- **Orchestration Loop** provides control, safety, and traceability
- **Agentic Loop** provides capability, adaptation, and problem-solving
- **Quality Gates** turn "seems done" into "is done" with evidence
- **Guardrails** enforce rules at capability, budget, policy, and behavioral levels
- **Journaling** makes everything replayable and auditable
When done well, you get **autonomy that is bounded, testable, and steadily improvable**.
---
SDK API Quick Reference
The complete list of SDK intrinsics (functions available on ctx):
| Function | Purpose | Example |
|---|---|---|
ctx.task(taskDef, args) | Execute a task | await ctx.task(buildTask, { target: 'dist' }) |
ctx.breakpoint(opts) | Pause for human approval | await ctx.breakpoint({ question: 'Deploy?', title: 'Approval' }) |
ctx.parallel.all([...]) | Run tasks in parallel | await ctx.parallel.all([() => ctx.task(a), () => ctx.task(b)]) |
ctx.parallel.map(arr, fn) | Map over array in parallel | await ctx.parallel.map(files, f => ctx.task(lint, { file: f })) |
ctx.sleepUntil(iso8601) | Pause until a specific time | await ctx.sleepUntil('2026-01-27T10:00:00Z') |
ctx.log(msg, data?) | Log message to journal | ctx.log('Quality score', { score: 85 }) |
ctx.now() | Get current time (deterministic) | const ts = ctx.now().getTime() |
ctx.runId | Current run identifier | const id = ctx.runId |
**Important:** There is NO ctx.retry(). Use loops for retry logic:
// Correct: Loop-based retry
for (let i = 0; i < maxIterations && !passed; i++) {
const result = await ctx.task(implementTask, { feedback });
passed = result.testsPass;
feedback = result.errors;
}---
What To Do Next
Based on your role, here's your next step:
| If you are... | Do this next |
|---|---|
| **Beginner** | Read Quality Convergence for the core iteration pattern |
| **Building processes** | Study Best Practices for workflow design |
| **Debugging a run** | Check Journal System to understand event sourcing |
| **Adding approvals** | See Breakpoints for human-in-the-loop patterns |
| **Evaluating for team** | Review the Four Guardrail Layers section above |