Agentic AI Atlas

Wiki article

docs/user-guide/features/two-loops-architecture

Reading · 12 min

Two-Loops Architecture: Understanding Hybrid Agentic Systems reference

Page nodewiki/docs/user-guide/features/two-loops-architecture.mdNearby pages · 10Documents · 0

Continue reading

Nearby pages in the same section.

Architecture Overview Best Practices Guide: Comprehensive Reference for Babysitter Breakpoints: Human-in-the-Loop Approval Hooks: Extensible Lifecycle Events Journal System: Event Sourcing and Audit Trail Parallel Execution: Running Tasks Concurrently Process Definitions: JavaScript Workflow Orchestration Process Library Quality Convergence: Iterative Improvement Until Targets Met Run Resumption: Pause and Continue Workflows

Two-Loops Architecture: Understanding Hybrid Agentic Systems

**Version:** 1.1 **Last Updated:** 2026-01-26 **Category:** Feature Guide

---

TL;DR - What You Need to Know

**Skip this section if you just want to USE babysitter.** This document explains the architecture for those who want to understand WHY babysitter works the way it does, or who are building custom processes.

**The key insight:** Babysitter separates "what must happen" (deterministic rules) from "how to do it" (AI reasoning). This makes AI workflows reliable and debuggable.

Code

┌─────────────────────────────────────────────────────────────────┐
│  LOOP 1: The Boss (Orchestrator)                                │
│  - "You must pass tests before deploying"                       │
│  - "You have max 10 attempts"                                   │
│  - "Stop and ask for approval at this point"                    │
│                                                                 │
│  LOOP 2: The Worker (AI Agent)                                  │
│  - "Figure out how to make these tests pass"                    │
│  - "Find and fix the bugs"                                      │
│  - "Write the code that solves the problem"                     │
└─────────────────────────────────────────────────────────────────┘

**When to read this document:**

You're building custom processes
You want to understand guardrails and safety
You're debugging why a run behaves a certain way
You're an architect evaluating babysitter for your team

**When to skip this document:**

You just want to run existing processes
You're following a tutorial
You're a beginner (start with Quality Convergence instead)

---

Overview

Babysitter implements a **Two-Loops Control Plane** architecture that combines:

1. **Symbolic Orchestration** (Process Engine): Deterministic, code-defined control 2. **Agentic Harness** (LLM Runtime): Adaptive, AI-powered work execution

This hybrid approach delivers the best of both worlds: the reliability of deterministic systems with the flexibility of AI reasoning.

Why Two Loops?

Single-Loop AI	Two-Loops Hybrid
Unpredictable behavior	Bounded, testable autonomy
Hard to debug	Journaled, replayable execution
No safety guarantees	Enforced guardrails and gates
"It seems done"	Evidence-driven completion
Context degradation	Fresh context per task

---

The Core Building Blocks

A) Symbolic Orchestrator (Process Engine)

The orchestrator is the code-defined process that enforces:

Responsibility	Example
Ground truth state	Run is in "implementation" phase
Progression rules	Must pass tests before deployment
Invariants	Never modify production directly
Budgets	Max 10 iterations, 30 min timeout
Permissions	Only write to `src/` directory
Quality gates	Tests, lint, security must pass
Journaling	Every event recorded for replay
Time travel	Fork from any point, compare runs

**The orchestrator owns making execution dependable.**

B) Agent Harness (LLM Runtime)

The harness is not "just an LLM call." Modern harnesses include:

Capability	Description
Iterative planning	Plan → Execute → Replan
Tool calling	Files, terminal, search, code execution
Command execution	Parse results, handle errors
Incremental fixes	Iterate until checks pass
Structured artifacts	Plans, diffs, summaries
Multi-step reasoning	With constraints
Sub-agents	Delegation inside the harness

**The harness owns solving fuzzy parts and adapting to feedback.**

C) Symbolic Logic Surfaces (Shared Capabilities)

Symbolic logic appears in **multiple places**, all consistent:

1. **Inside orchestrator** (stage transitions, invariants, gates, budgets) 2. **As symbolic tools** callable by the harness (policy checks, gate evaluation) 3. **As symbolic tasks** callable by orchestration (validators, analyzers)

javascript

// Symbolic logic as orchestrator rule (using loop for retry)
for (let iteration = 0; iteration < maxIterations; iteration++) {
  const impl = await ctx.task(implementTask, { feature });
  const testResults = await ctx.task(runTestsTask, { impl });

  if (testResults.passed) break; // Success - exit loop
  // Loop continues with feedback from failed tests
}

// Symbolic logic as tool callable by harness
const allowed = await ctx.task(policyCheckTask, {
  action: 'modifyFile',
  path: '/etc/config.json'
});

// Symbolic logic as validation task
const gateResult = await ctx.task(securityGateTask, {
  files: impl.filesModified
});

---

The Two Loops in Detail

Loop 1: Orchestration Loop (Symbolic)

A process stepper that progresses a run through explicit stages.

**Typical Cycle:**

Code

1. Reconstruct "what is true" from the journal
2. Determine what stage the run is in
3. Check gates/constraints/budgets
4. Choose the next allowed transition
5. Emit the next effect (or wait)
6. Record results back into the journal

**This loop is about:** control, safety, repeatability, traceability.

Loop 2: Agentic Loop (Harness)

A tool-using reasoning loop that iterates until reaching a local objective.

**Typical Cycle:**

Code

1. Read current objective + constraints
2. Decide what evidence is needed
3. Call tools, inspect results
4. Update plan or actions
5. Produce an output (patch, plan, answer, report)

**This loop is about:** solving the task when information is incomplete.

---

What Goes Where?

The design challenge is deciding **which execution decisions are deterministic/symbolic** and **which are adaptive/agentic**.

Put in Symbolic Logic When...

These decisions must be **stable, enforceable, and auditable**:

Decision Type	Examples
Safety/permissions	What actions are allowed
Budgets/limits	Time, cost, tool call limits
State transitions	What stage you're in
Concurrency rules	What can run in parallel
Retry/timeout policy	What happens on failure
Idempotency	Avoid double execution
Quality gates	What proof is required
Compliance/audit	Logging requirements

Put in Agent Harness When...

These decisions benefit from **flexible reasoning**:

Decision Type	Examples
Ambiguous instructions	"Make it better"
Uncertain approach	Multiple valid solutions
Search/discovery	Find relevant files
Drafting	Code, docs, analyses
Debugging	Iterate against tool results
Summarizing	Compress evidence
Proposing	Candidate solutions

The Mixed Zone

Many tasks are mixed. The pattern is:

**Symbolic logic defines the envelope** (constraints + gates + budgets)
**Harness explores inside that envelope** (implements, debugs, refines)
**Both can invoke symbolic rules** (nothing is guesswork)

javascript

// Mixed: Harness works, orchestrator validates (loop-based retry)
let securityPassed = false;
for (let iteration = 0; iteration < maxIterations && !securityPassed; iteration++) {
  const impl = await ctx.task(implementTask, {
    feature,
    constraints: {
      allowedPaths: ['src/**'],
      forbiddenPatterns: ['eval(', 'exec('],
      maxFilesModified: 10
    },
    // Pass previous feedback on retry iterations
    feedback: iteration > 0 ? lastSecurityResult.recommendations : null
  });

  // Orchestrator enforces gate
  const securityResult = await ctx.task(securityGateTask, { impl });
  securityPassed = securityResult.passed;
  lastSecurityResult = securityResult;
}

---

The Four Guardrail Layers

Guardrails are a **layered approach**, not a single feature.

Layer A: Capability Guardrails (What's Possible)

Define what tools and actions exist.

javascript

const capabilityConfig = {
  allowedTools: ['read', 'write', 'shell', 'search'],
  pathRestrictions: ['src/**', 'tests/**'],
  networkAccess: 'none',
  permissions: 'read-write',
  destructiveActions: 'require-confirmation'
};

Layer B: Budget Guardrails (How Far)

Prevent runaway execution.

javascript

const budgetConfig = {
  maxToolCalls: 100,
  maxWallClockMinutes: 30,
  maxTokenSpend: 50000,
  maxIterations: 10,
  rateLimits: { apiCalls: '10/minute' }
};

Layer C: Policy Guardrails (What's Allowed)

Rules that define acceptable behavior.

javascript

const policyConfig = {
  rules: [
    'never exfiltrate secrets',
    'never modify production directly',
    'always run tests before merge',
    'security scans required for dependencies'
  ]
};

Layer D: Behavioral Guardrails (How Decisions Are Made)

Structural consistency in outputs.

javascript

const behavioralConfig = {
  requireStructuredOutputs: true,
  requireEvidenceCitations: true,
  requireUncertaintyDeclaration: true,
  outputSchemas: { /* JSON schemas */ }
};

---

Quality Gates: Turning Agentic Work into Reliable Outcomes

Quality gates convert "it seems done" into "it is done."

The Evidence-Driven Pattern

Each phase must end with:

Component	Description
Artifact	The work product (patch, doc, config, report)
Evidence	Proof it meets requirements (logs, test output, checks)

**If you don't have evidence, you don't have completion.**

Common Gated Steps

Gate Type	What It Validates
Unit tests	Individual functions work
Integration tests	Components work together
System tests	End-to-end behavior
Acceptance tests	User requirements met
Lint/formatting	Code style compliance
Type checking	Type safety
Static analysis	Potential bugs
Security scans	Vulnerabilities
Reproducibility	Clean run in fresh env
Diff review	No forbidden file changes
Performance	Meets thresholds

Where Gates Live (Consistent Everywhere)

javascript

// In orchestrator: loop-based retry for gate failures
let gateResults = { passed: false };
for (let i = 0; i < maxIterations && !gateResults.passed; i++) {
  const impl = await ctx.task(implementTask, { feature, feedback: gateResults.failures });
  gateResults = await ctx.task(runGatesTask, { impl });
}

// As symbolic tool: harness pre-checks during work
const gateResult = await checkGate(impl);
if (!gateResult.passed) {
  // Harness can immediately attempt repair
  await repairIssues(gateResult.failures);
}

// As symbolic task: verify evidence objectively
const evidence = await ctx.task(gateValidatorTask, { impl });

Human Approval Gates

For high-impact steps, include explicit checkpoints:

javascript

// Plan approval before execution
await ctx.breakpoint({
  question: 'Review the plan. Approve to proceed with implementation?',
  title: 'Plan Approval',
  context: { /* ... */ }
});

// Diff approval before merge
await ctx.breakpoint({
  question: `Review the diff (${diff.linesChanged} lines). Approve to merge?`,
  title: 'Merge Approval'
});

// Deployment approval
await ctx.breakpoint({
  question: 'Quality: 92/100. Deploy to production?',
  title: 'Production Deployment'
});

---

The Journal: Making Execution Testable

A journaled control plane turns agentic behavior into something you can:

Capability	Value
Replay	Debug by re-running
Inspect	See exactly what happened
Diff	Compare across forks
Audit	Compliance evidence
Analyze	Failure pattern detection

What's Journaled

Event Type	Example
Inputs/signals	Initial requirements
Stage transitions	"planning" → "implementation"
Requested actions	`writeFile('/src/auth.ts', ...)`
Results	Action succeeded, 42 lines written
Artifacts	`plan.md`, `implementation.patch`
Evidence	Test results, gate outcomes
Gate outcomes	Security: PASS, Tests: PASS
Approvals	User approved at breakpoint

---

Prompt Quality is Determinism Engineering

In a two-loop system, prompts are **configuration for the harness**.

Why Prompt Quality Matters

Better prompts reduce:

Output variance
Tool misuse
Hidden assumptions
Inconsistent formatting
Unpredictable branching

Better prompts improve:

Repeatability
Debuggability
Fork comparisons
Safe automation

The Real Goal: Structural Consistency

You don't need identical wording. You need consistent:

Decision formats
Priorities
Stop/ask conditions
Evidence standards

Prompt Versioning

Treat harness prompts like engineering surfaces:

javascript

const promptVersion = '2.1.0';

const implementerPrompt = {
  version: promptVersion,
  role: 'senior software engineer',
  task: 'Implement feature according to specification',
  constraints: [
    'Follow existing code patterns',
    'Write tests for all public functions',
    'Document complex logic',
    'Ask for clarification if requirements are ambiguous'
  ],
  outputFormat: {
    type: 'object',
    required: ['filesModified', 'summary', 'confidence']
  }
};

---

Common Failure Modes and Fixes

1. Everything is Agentic

**Symptom:** Unpredictable behavior, hard to debug, inconsistent safety.

**Fix:** Move gates, budgets, and invariants into symbolic orchestration.

2. Everything is Symbolic

**Symptom:** Brittle workflows, poor adaptation, high maintenance.

**Fix:** Delegate fuzzy decisions and exploration to the harness.

3. Hidden State

**Symptom:** The harness "remembers" things the system never logged.

**Fix:** Journal what matters; the system's truth must be reconstructible.

4. Wide Tool Surface

**Symptom:** Tool confusion, increased risk, unpredictable results.

**Fix:** Keep tools small, stable, and well-described.

5. No Explicit Evidence Requirements

**Symptom:** "Done" claims without proof.

**Fix:** Define completion as artifact + evidence, enforced by gates.

---

The Doctrine

If you define only a few principles, make them these:

1. **The orchestrator owns** run progression, journaling, and phase boundaries 2. **Symbolic logic owns** constraints, permissions, budgets, and gates 3. **The harness owns** adaptive work inside constraints 4. **Guardrails are enforced** by symbolic checks, not informal intentions 5. **Quality is evidence-driven**, not assertion-driven 6. **Prompts are versioned** control surfaces for harness behavior 7. **The journal is the source** of truth for replay, audit, and forking

---

Getting Started

If you're building from scratch:

1. **Define phases** (a small symbolic process) 2. **Define effects/tools** available in each phase 3. **Add budgets and permissions** 4. **Decide quality gates per phase** 5. **Add a harness** that can do real work 6. **Journal everything** needed for replay and audit 7. **Add fork + time travel** as first-class operations

**If you do only one thing:** make completion require evidence.

---

Process Library Examples

Spec-Driven Development

methodologies/spec-driven-development.js

Implements the full two-loops pattern:

**Symbolic:** Constitution validation, plan-constitution alignment, consistency analysis
**Agentic:** Specification writing, planning, implementation
**Gates:** Every phase has approval breakpoints

V-Model

methodologies/v-model.js

Heavy on symbolic verification:

**Four test levels** designed before implementation
**Traceability matrix** ensures complete coverage
**Safety levels** adjust rigor

GSD Iterative Convergence

gsd/iterative-convergence.js

Feedback-driven quality loop:

**Implement → Score → Feedback → Repeat**
**Breakpoints** at quality thresholds
**Plateau detection** for early exit

---

Summary

The Two-Loops architecture enables bounded, testable autonomy:

**Orchestration Loop** provides control, safety, and traceability
**Agentic Loop** provides capability, adaptation, and problem-solving
**Quality Gates** turn "seems done" into "is done" with evidence
**Guardrails** enforce rules at capability, budget, policy, and behavioral levels
**Journaling** makes everything replayable and auditable

When done well, you get **autonomy that is bounded, testable, and steadily improvable**.

---

SDK API Quick Reference

The complete list of SDK intrinsics (functions available on ctx):

Function	Purpose	Example
`ctx.task(taskDef, args)`	Execute a task	`await ctx.task(buildTask, { target: 'dist' })`
`ctx.breakpoint(opts)`	Pause for human approval	`await ctx.breakpoint({ question: 'Deploy?', title: 'Approval' })`
`ctx.parallel.all([...])`	Run tasks in parallel	`await ctx.parallel.all([() => ctx.task(a), () => ctx.task(b)])`
`ctx.parallel.map(arr, fn)`	Map over array in parallel	`await ctx.parallel.map(files, f => ctx.task(lint, { file: f }))`
`ctx.sleepUntil(iso8601)`	Pause until a specific time	`await ctx.sleepUntil('2026-01-27T10:00:00Z')`
`ctx.log(msg, data?)`	Log message to journal	`ctx.log('Quality score', { score: 85 })`
`ctx.now()`	Get current time (deterministic)	`const ts = ctx.now().getTime()`
`ctx.runId`	Current run identifier	`const id = ctx.runId`

**Important:** There is NO ctx.retry(). Use loops for retry logic:

javascript

// Correct: Loop-based retry
for (let i = 0; i < maxIterations && !passed; i++) {
  const result = await ctx.task(implementTask, { feedback });
  passed = result.testsPass;
  feedback = result.errors;
}

---

What To Do Next

Based on your role, here's your next step:

If you are...	Do this next
Beginner	Read Quality Convergence for the core iteration pattern
Building processes	Study Best Practices for workflow design
Debugging a run	Check Journal System to understand event sourcing
Adding approvals	See Breakpoints for human-in-the-loop patterns
Evaluating for team	Review the Four Guardrail Layers section above

Wiki article

docs/user-guide/features/two-loops-architecture

Reading · 12 min

Two-Loops Architecture: Understanding Hybrid Agentic Systems reference

Page nodewiki/docs/user-guide/features/two-loops-architecture.mdNearby pages · 10Documents · 0

Continue reading

Nearby pages in the same section.

Two-Loops Architecture: Understanding Hybrid Agentic Systems

**Version:** 1.1 **Last Updated:** 2026-01-26 **Category:** Feature Guide

---

TL;DR - What You Need to Know

**The key insight:** Babysitter separates "what must happen" (deterministic rules) from "how to do it" (AI reasoning). This makes AI workflows reliable and debuggable.

Code

┌─────────────────────────────────────────────────────────────────┐
│  LOOP 1: The Boss (Orchestrator)                                │
│  - "You must pass tests before deploying"                       │
│  - "You have max 10 attempts"                                   │
│  - "Stop and ask for approval at this point"                    │
│                                                                 │
│  LOOP 2: The Worker (AI Agent)                                  │
│  - "Figure out how to make these tests pass"                    │
│  - "Find and fix the bugs"                                      │
│  - "Write the code that solves the problem"                     │
└─────────────────────────────────────────────────────────────────┘

**When to read this document:**

You're building custom processes
You want to understand guardrails and safety
You're debugging why a run behaves a certain way
You're an architect evaluating babysitter for your team

**When to skip this document:**

You just want to run existing processes
You're following a tutorial
You're a beginner (start with Quality Convergence instead)

---

Overview

Babysitter implements a **Two-Loops Control Plane** architecture that combines:

1. **Symbolic Orchestration** (Process Engine): Deterministic, code-defined control 2. **Agentic Harness** (LLM Runtime): Adaptive, AI-powered work execution

This hybrid approach delivers the best of both worlds: the reliability of deterministic systems with the flexibility of AI reasoning.

Why Two Loops?

Single-Loop AI	Two-Loops Hybrid
Unpredictable behavior	Bounded, testable autonomy
Hard to debug	Journaled, replayable execution
No safety guarantees	Enforced guardrails and gates
"It seems done"	Evidence-driven completion
Context degradation	Fresh context per task

---

The Core Building Blocks

A) Symbolic Orchestrator (Process Engine)

The orchestrator is the code-defined process that enforces:

Responsibility	Example
Ground truth state	Run is in "implementation" phase
Progression rules	Must pass tests before deployment
Invariants	Never modify production directly
Budgets	Max 10 iterations, 30 min timeout
Permissions	Only write to `src/` directory
Quality gates	Tests, lint, security must pass
Journaling	Every event recorded for replay
Time travel	Fork from any point, compare runs

**The orchestrator owns making execution dependable.**

B) Agent Harness (LLM Runtime)

The harness is not "just an LLM call." Modern harnesses include:

Capability	Description
Iterative planning	Plan → Execute → Replan
Tool calling	Files, terminal, search, code execution
Command execution	Parse results, handle errors
Incremental fixes	Iterate until checks pass
Structured artifacts	Plans, diffs, summaries
Multi-step reasoning	With constraints
Sub-agents	Delegation inside the harness

**The harness owns solving fuzzy parts and adapting to feedback.**

C) Symbolic Logic Surfaces (Shared Capabilities)

Symbolic logic appears in **multiple places**, all consistent:

javascript

// Symbolic logic as orchestrator rule (using loop for retry)
for (let iteration = 0; iteration < maxIterations; iteration++) {
  const impl = await ctx.task(implementTask, { feature });
  const testResults = await ctx.task(runTestsTask, { impl });

  if (testResults.passed) break; // Success - exit loop
  // Loop continues with feedback from failed tests
}

// Symbolic logic as tool callable by harness
const allowed = await ctx.task(policyCheckTask, {
  action: 'modifyFile',
  path: '/etc/config.json'
});

// Symbolic logic as validation task
const gateResult = await ctx.task(securityGateTask, {
  files: impl.filesModified
});

---

The Two Loops in Detail

Loop 1: Orchestration Loop (Symbolic)

A process stepper that progresses a run through explicit stages.

**Typical Cycle:**

Code

1. Reconstruct "what is true" from the journal
2. Determine what stage the run is in
3. Check gates/constraints/budgets
4. Choose the next allowed transition
5. Emit the next effect (or wait)
6. Record results back into the journal

**This loop is about:** control, safety, repeatability, traceability.

Loop 2: Agentic Loop (Harness)

A tool-using reasoning loop that iterates until reaching a local objective.

**Typical Cycle:**

Code

1. Read current objective + constraints
2. Decide what evidence is needed
3. Call tools, inspect results
4. Update plan or actions
5. Produce an output (patch, plan, answer, report)

**This loop is about:** solving the task when information is incomplete.

---

What Goes Where?

The design challenge is deciding **which execution decisions are deterministic/symbolic** and **which are adaptive/agentic**.

Put in Symbolic Logic When...

These decisions must be **stable, enforceable, and auditable**:

Decision Type	Examples
Safety/permissions	What actions are allowed
Budgets/limits	Time, cost, tool call limits
State transitions	What stage you're in
Concurrency rules	What can run in parallel
Retry/timeout policy	What happens on failure
Idempotency	Avoid double execution
Quality gates	What proof is required
Compliance/audit	Logging requirements

Put in Agent Harness When...

These decisions benefit from **flexible reasoning**:

Decision Type	Examples
Ambiguous instructions	"Make it better"
Uncertain approach	Multiple valid solutions
Search/discovery	Find relevant files
Drafting	Code, docs, analyses
Debugging	Iterate against tool results
Summarizing	Compress evidence
Proposing	Candidate solutions

The Mixed Zone

Many tasks are mixed. The pattern is:

**Symbolic logic defines the envelope** (constraints + gates + budgets)
**Harness explores inside that envelope** (implements, debugs, refines)
**Both can invoke symbolic rules** (nothing is guesswork)

javascript

// Mixed: Harness works, orchestrator validates (loop-based retry)
let securityPassed = false;
for (let iteration = 0; iteration < maxIterations && !securityPassed; iteration++) {
  const impl = await ctx.task(implementTask, {
    feature,
    constraints: {
      allowedPaths: ['src/**'],
      forbiddenPatterns: ['eval(', 'exec('],
      maxFilesModified: 10
    },
    // Pass previous feedback on retry iterations
    feedback: iteration > 0 ? lastSecurityResult.recommendations : null
  });

  // Orchestrator enforces gate
  const securityResult = await ctx.task(securityGateTask, { impl });
  securityPassed = securityResult.passed;
  lastSecurityResult = securityResult;
}

---

The Four Guardrail Layers

Guardrails are a **layered approach**, not a single feature.

Layer A: Capability Guardrails (What's Possible)

Define what tools and actions exist.

javascript

const capabilityConfig = {
  allowedTools: ['read', 'write', 'shell', 'search'],
  pathRestrictions: ['src/**', 'tests/**'],
  networkAccess: 'none',
  permissions: 'read-write',
  destructiveActions: 'require-confirmation'
};

Layer B: Budget Guardrails (How Far)

Prevent runaway execution.

javascript

const budgetConfig = {
  maxToolCalls: 100,
  maxWallClockMinutes: 30,
  maxTokenSpend: 50000,
  maxIterations: 10,
  rateLimits: { apiCalls: '10/minute' }
};

Layer C: Policy Guardrails (What's Allowed)

Rules that define acceptable behavior.

javascript

const policyConfig = {
  rules: [
    'never exfiltrate secrets',
    'never modify production directly',
    'always run tests before merge',
    'security scans required for dependencies'
  ]
};

Layer D: Behavioral Guardrails (How Decisions Are Made)

Structural consistency in outputs.

javascript

const behavioralConfig = {
  requireStructuredOutputs: true,
  requireEvidenceCitations: true,
  requireUncertaintyDeclaration: true,
  outputSchemas: { /* JSON schemas */ }
};

---

Quality Gates: Turning Agentic Work into Reliable Outcomes

Quality gates convert "it seems done" into "it is done."

The Evidence-Driven Pattern

Each phase must end with:

Component	Description
Artifact	The work product (patch, doc, config, report)
Evidence	Proof it meets requirements (logs, test output, checks)

**If you don't have evidence, you don't have completion.**

Common Gated Steps

Gate Type	What It Validates
Unit tests	Individual functions work
Integration tests	Components work together
System tests	End-to-end behavior
Acceptance tests	User requirements met
Lint/formatting	Code style compliance
Type checking	Type safety
Static analysis	Potential bugs
Security scans	Vulnerabilities
Reproducibility	Clean run in fresh env
Diff review	No forbidden file changes
Performance	Meets thresholds

Where Gates Live (Consistent Everywhere)

javascript

// In orchestrator: loop-based retry for gate failures
let gateResults = { passed: false };
for (let i = 0; i < maxIterations && !gateResults.passed; i++) {
  const impl = await ctx.task(implementTask, { feature, feedback: gateResults.failures });
  gateResults = await ctx.task(runGatesTask, { impl });
}

// As symbolic tool: harness pre-checks during work
const gateResult = await checkGate(impl);
if (!gateResult.passed) {
  // Harness can immediately attempt repair
  await repairIssues(gateResult.failures);
}

// As symbolic task: verify evidence objectively
const evidence = await ctx.task(gateValidatorTask, { impl });

Human Approval Gates

For high-impact steps, include explicit checkpoints:

javascript

// Plan approval before execution
await ctx.breakpoint({
  question: 'Review the plan. Approve to proceed with implementation?',
  title: 'Plan Approval',
  context: { /* ... */ }
});

// Diff approval before merge
await ctx.breakpoint({
  question: `Review the diff (${diff.linesChanged} lines). Approve to merge?`,
  title: 'Merge Approval'
});

// Deployment approval
await ctx.breakpoint({
  question: 'Quality: 92/100. Deploy to production?',
  title: 'Production Deployment'
});

---

The Journal: Making Execution Testable

A journaled control plane turns agentic behavior into something you can:

Capability	Value
Replay	Debug by re-running
Inspect	See exactly what happened
Diff	Compare across forks
Audit	Compliance evidence
Analyze	Failure pattern detection

What's Journaled

Event Type	Example
Inputs/signals	Initial requirements
Stage transitions	"planning" → "implementation"
Requested actions	`writeFile('/src/auth.ts', ...)`
Results	Action succeeded, 42 lines written
Artifacts	`plan.md`, `implementation.patch`
Evidence	Test results, gate outcomes
Gate outcomes	Security: PASS, Tests: PASS
Approvals	User approved at breakpoint

---

Prompt Quality is Determinism Engineering

In a two-loop system, prompts are **configuration for the harness**.

Why Prompt Quality Matters

Better prompts reduce:

Output variance
Tool misuse
Hidden assumptions
Inconsistent formatting
Unpredictable branching

Better prompts improve:

Repeatability
Debuggability
Fork comparisons
Safe automation

The Real Goal: Structural Consistency

You don't need identical wording. You need consistent:

Decision formats
Priorities
Stop/ask conditions
Evidence standards

Prompt Versioning

Treat harness prompts like engineering surfaces:

javascript

const promptVersion = '2.1.0';

const implementerPrompt = {
  version: promptVersion,
  role: 'senior software engineer',
  task: 'Implement feature according to specification',
  constraints: [
    'Follow existing code patterns',
    'Write tests for all public functions',
    'Document complex logic',
    'Ask for clarification if requirements are ambiguous'
  ],
  outputFormat: {
    type: 'object',
    required: ['filesModified', 'summary', 'confidence']
  }
};

---

Common Failure Modes and Fixes

1. Everything is Agentic

**Symptom:** Unpredictable behavior, hard to debug, inconsistent safety.

**Fix:** Move gates, budgets, and invariants into symbolic orchestration.

2. Everything is Symbolic

**Symptom:** Brittle workflows, poor adaptation, high maintenance.

**Fix:** Delegate fuzzy decisions and exploration to the harness.

3. Hidden State

**Symptom:** The harness "remembers" things the system never logged.

**Fix:** Journal what matters; the system's truth must be reconstructible.

4. Wide Tool Surface

**Symptom:** Tool confusion, increased risk, unpredictable results.

**Fix:** Keep tools small, stable, and well-described.

5. No Explicit Evidence Requirements

**Symptom:** "Done" claims without proof.

**Fix:** Define completion as artifact + evidence, enforced by gates.

---

The Doctrine

If you define only a few principles, make them these:

---

Getting Started

If you're building from scratch:

**If you do only one thing:** make completion require evidence.

---

Process Library Examples

Spec-Driven Development

methodologies/spec-driven-development.js

Implements the full two-loops pattern:

**Symbolic:** Constitution validation, plan-constitution alignment, consistency analysis
**Agentic:** Specification writing, planning, implementation
**Gates:** Every phase has approval breakpoints

V-Model

methodologies/v-model.js

Heavy on symbolic verification:

**Four test levels** designed before implementation
**Traceability matrix** ensures complete coverage
**Safety levels** adjust rigor

GSD Iterative Convergence

gsd/iterative-convergence.js

Feedback-driven quality loop:

**Implement → Score → Feedback → Repeat**
**Breakpoints** at quality thresholds
**Plateau detection** for early exit

---

Summary

The Two-Loops architecture enables bounded, testable autonomy:

**Orchestration Loop** provides control, safety, and traceability
**Agentic Loop** provides capability, adaptation, and problem-solving
**Quality Gates** turn "seems done" into "is done" with evidence
**Guardrails** enforce rules at capability, budget, policy, and behavioral levels
**Journaling** makes everything replayable and auditable

When done well, you get **autonomy that is bounded, testable, and steadily improvable**.

---

SDK API Quick Reference

The complete list of SDK intrinsics (functions available on ctx):

Function	Purpose	Example
`ctx.task(taskDef, args)`	Execute a task	`await ctx.task(buildTask, { target: 'dist' })`
`ctx.breakpoint(opts)`	Pause for human approval	`await ctx.breakpoint({ question: 'Deploy?', title: 'Approval' })`
`ctx.parallel.all([...])`	Run tasks in parallel	`await ctx.parallel.all([() => ctx.task(a), () => ctx.task(b)])`
`ctx.parallel.map(arr, fn)`	Map over array in parallel	`await ctx.parallel.map(files, f => ctx.task(lint, { file: f }))`
`ctx.sleepUntil(iso8601)`	Pause until a specific time	`await ctx.sleepUntil('2026-01-27T10:00:00Z')`
`ctx.log(msg, data?)`	Log message to journal	`ctx.log('Quality score', { score: 85 })`
`ctx.now()`	Get current time (deterministic)	`const ts = ctx.now().getTime()`
`ctx.runId`	Current run identifier	`const id = ctx.runId`

**Important:** There is NO ctx.retry(). Use loops for retry logic:

javascript

// Correct: Loop-based retry
for (let i = 0; i < maxIterations && !passed; i++) {
  const result = await ctx.task(implementTask, { feedback });
  passed = result.testsPass;
  feedback = result.errors;
}

---

What To Do Next

Based on your role, here's your next step:

If you are...	Do this next
Beginner	Read Quality Convergence for the core iteration pattern
Building processes	Study Best Practices for workflow design
Debugging a run	Check Journal System to understand event sourcing
Adding approvals	See Breakpoints for human-in-the-loop patterns
Evaluating for team	Review the Four Guardrail Layers section above

Two-Loops Architecture: Understanding Hybrid Agentic Systems reference

Continue reading

Two-Loops Architecture: Understanding Hybrid Agentic Systems

TL;DR - What You Need to Know

Overview

Why Two Loops?

The Core Building Blocks

A) Symbolic Orchestrator (Process Engine)

B) Agent Harness (LLM Runtime)

C) Symbolic Logic Surfaces (Shared Capabilities)

The Two Loops in Detail

Loop 1: Orchestration Loop (Symbolic)

Loop 2: Agentic Loop (Harness)

What Goes Where?

Put in Symbolic Logic When...

Put in Agent Harness When...

The Mixed Zone

The Four Guardrail Layers

Layer A: Capability Guardrails (What's Possible)

Layer B: Budget Guardrails (How Far)

Layer C: Policy Guardrails (What's Allowed)

Layer D: Behavioral Guardrails (How Decisions Are Made)

Quality Gates: Turning Agentic Work into Reliable Outcomes

The Evidence-Driven Pattern

Common Gated Steps

Where Gates Live (Consistent Everywhere)

Human Approval Gates

The Journal: Making Execution Testable

What's Journaled

Prompt Quality is Determinism Engineering

Why Prompt Quality Matters

The Real Goal: Structural Consistency

Prompt Versioning

Common Failure Modes and Fixes

1. Everything is Agentic

2. Everything is Symbolic

3. Hidden State

4. Wide Tool Surface

5. No Explicit Evidence Requirements

The Doctrine

Getting Started

Process Library Examples

Spec-Driven Development

V-Model

GSD Iterative Convergence

Related Documentation

Summary

SDK API Quick Reference

What To Do Next

Two-Loops Architecture: Understanding Hybrid Agentic Systems reference

Continue reading

Two-Loops Architecture: Understanding Hybrid Agentic Systems

TL;DR - What You Need to Know

Overview

Why Two Loops?

The Core Building Blocks

A) Symbolic Orchestrator (Process Engine)

B) Agent Harness (LLM Runtime)

C) Symbolic Logic Surfaces (Shared Capabilities)

The Two Loops in Detail

Loop 1: Orchestration Loop (Symbolic)

Loop 2: Agentic Loop (Harness)

What Goes Where?

Put in Symbolic Logic When...

Put in Agent Harness When...

The Mixed Zone

The Four Guardrail Layers

Layer A: Capability Guardrails (What's Possible)

Layer B: Budget Guardrails (How Far)

Layer C: Policy Guardrails (What's Allowed)

Layer D: Behavioral Guardrails (How Decisions Are Made)

Quality Gates: Turning Agentic Work into Reliable Outcomes

The Evidence-Driven Pattern

Common Gated Steps

Where Gates Live (Consistent Everywhere)

Human Approval Gates

The Journal: Making Execution Testable

What's Journaled

Prompt Quality is Determinism Engineering

Why Prompt Quality Matters