Agentic AI Atlasby a5c.ai
OverviewWikiGraphFor AgentsEdgesSearchWorkspace
/
GitHubDocsDiscord
i.4Wiki
Agentic AI Atlas · Quality Convergence: Iterative Improvement Until Targets Met
docs/user-guide/features/quality-convergencea5c.ai
Search the atlas/
Wiki · linked records

Article and nearby pages

I.Current articlepp. 1 - 1
Architecture OverviewBest Practices Guide: Comprehensive Reference for BabysitterBreakpoints: Human-in-the-Loop ApprovalHooks: Extensible Lifecycle EventsJournal System: Event Sourcing and Audit TrailParallel Execution: Running Tasks Concurrently
I.
Wiki article

docs/user-guide/features/quality-convergence

Reading · 22 min

Quality Convergence: Iterative Improvement Until Targets Met reference

Instead of:

Page nodewiki/docs/user-guide/features/quality-convergence.mdNearby pages · 10Documents · 0

Continue reading

Nearby pages in the same section.

Architecture OverviewBest Practices Guide: Comprehensive Reference for BabysitterBreakpoints: Human-in-the-Loop ApprovalHooks: Extensible Lifecycle EventsJournal System: Event Sourcing and Audit TrailParallel Execution: Running Tasks ConcurrentlyProcess Definitions: JavaScript Workflow OrchestrationProcess LibraryRun Resumption: Pause and Continue WorkflowsTwo-Loops Architecture: Understanding Hybrid Agentic Systems

Quality Convergence: Iterative Improvement Until Targets Met

**Version:** 2.1 **Last Updated:** 2026-01-26 **Category:** Feature Guide

---

Quick Summary (Read This First)

**Quality Convergence = "Keep trying until it's good enough"**

Instead of:

Code
AI writes code → Tests fail → You manually fix → Tests fail again → Repeat 10x

Babysitter does:

Code
AI writes code → Tests: 60% pass → AI fixes → Tests: 85% pass → AI fixes → Tests: 95% pass ✓ Done!

What You'll Learn in This Document

SectionWhat It CoversRead If You Want To...
Five Quality GatesTypes of checks (tests, lint, security, etc.)Understand what gets checked
90-Score PatternHow to reliably hit high qualityBuild production-ready workflows
Process ExamplesReal code from the librarySee working implementations
Step-by-StepHow to build your ownCreate custom quality loops

A Simple Example

Here's what quality convergence looks like in practice:

Code
Iteration 1:
  - AI writes login feature
  - Tests run: 3/10 passing (30%)
  - AI sees: "Missing password validation, no error handling"

Iteration 2:
  - AI fixes based on feedback
  - Tests run: 7/10 passing (70%)
  - AI sees: "Edge case for empty email not handled"

Iteration 3:
  - AI fixes edge cases
  - Tests run: 10/10 passing (100%)
  - Quality target met! ✓

Output: Working login feature with all tests passing

**Key insight**: The AI doesn't just try once - it learns from each failure and improves.

Understanding Quality Scores

**Quality scores are multi-dimensional, not a single number.** This is what makes Babysitter's quality convergence so accurate - instead of a simple pass/fail, you get nuanced feedback across multiple dimensions that guide improvement.

A typical quality score includes:

DimensionWhat It MeasuresExample
**Tests**Pass rate and coverage92% tests passing, 85% coverage
**Code Quality**Lint errors, complexity0 lint errors, complexity < 10
**Security**Vulnerabilities, secrets0 critical issues
**Performance**Response time, bundle sizep95 < 500ms
**Type Safety**Type errors, null safety0 type errors

The Power of Custom Dimensions

**You define what quality means for your project.** The dimensions above are just examples - you can:

1. **Define your own 5 dimensions** that matter most for your domain 2. **Ask Babysitter to suggest dimensions** appropriate for your specific task 3. **Weight dimensions differently** based on project phase or criticality

For example, a data pipeline might use completely different dimensions:

DimensionWeightThreshold
**Data Accuracy**30%> 99.9%
**Processing Speed**25%< 5 min/GB
**Schema Validation**20%100% valid
**Idempotency**15%All operations idempotent
**Error Recovery**10%Auto-recovery < 30s

This flexibility means quality convergence adapts to any domain - from ML model training to infrastructure deployment to documentation generation.

**For detailed scoring formulas and weight configurations, see Best Practices - Custom Scoring Strategies.**

---

Overview

Quality convergence is an iterative improvement pattern where Babysitter repeatedly refines work until a defined quality target is achieved. Instead of executing a task once and hoping for the best, quality convergence loops through implementation, testing, and scoring cycles until the output meets your standards.

The Core Principle: Evidence-Driven Completion

From the Two-Loops Control Plane architecture, the fundamental principle is:

**If you don't have evidence, you don't have completion.**

*If you do only one thing: make completion require evidence.* — This single principle transforms "it seems done" into "it is done."

Every phase must end with:

  • **Artifact**: The work product (patch, doc, config, report)
  • **Evidence**: Proof that it meets requirements (logs, test output, checks)

Why Use Quality Convergence

  • **Consistent Quality**: Guarantee outputs meet minimum quality thresholds
  • **Automated Refinement**: Let the system iterate without manual intervention
  • **Measurable Results**: Track quality scores across iterations
  • **Predictable Outcomes**: Set clear targets and iteration limits
  • **TDD Integration**: Combine with test-driven development for robust code
  • **Evidence-Based Completion**: Every iteration produces verifiable proof of quality

---

The Five Quality Gate Categories

Quality gates are not a single check. They form a **layered validation system** that ensures completeness from multiple perspectives. For robust quality convergence, use **4-5 gate types simultaneously**.

Gate Type 1: Functional Tests (Unit/Integration/System/Acceptance)

Verifies the code behaves correctly across all levels.

javascript
// From: methodologies/v-model.js (V-Model process)
const testResults = await ctx.task(executeTestsTask, {
  implementation,
  unitTestDesigns,      // Validates module design
  integrationTestDesign, // Validates architecture
  systemTestDesign,      // Validates system design
  acceptanceTestDesign   // Validates requirements
});

const allTestsPassed =
  testResults.unitTests.passed &&
  testResults.integrationTests.passed &&
  testResults.systemTests.passed &&
  testResults.acceptanceTests.passed;

**Gate Criteria:**

Test LevelWhat It ValidatesTypical Pass Threshold
Unit TestsIndividual functions/classes90-100% pass rate
Integration TestsModule interactions95-100% pass rate
System TestsEnd-to-end behavior90-100% pass rate
Acceptance TestsUser requirements100% for critical

Gate Type 2: Code Quality (Lint/Format/Complexity)

Ensures code follows style guidelines and maintainability standards.

javascript
// Parallel code quality checks
const [lint, format, complexity] = await ctx.parallel.all([
  () => ctx.task(lintTask, { files: impl.filesModified }),
  () => ctx.task(formatCheckTask, { files: impl.filesModified }),
  () => ctx.task(complexityTask, { files: impl.filesModified })
]);

const codeQualityGatePassed =
  lint.errorCount === 0 &&
  format.violations === 0 &&
  complexity.maxCyclomaticComplexity < 10;

**Gate Criteria:**

CheckTool ExamplesTypical Threshold
Lint ErrorsESLint, Pylint0 errors
FormattingPrettier, Black0 violations
Cyclomatic ComplexitySonarQube, Radon< 10 per function
Code Duplicationjscpd, CPD< 3% duplication

Gate Type 3: Type Safety and Static Analysis

Catches bugs at compile/analysis time without running the code.

javascript
// From: gsd/iterative-convergence enhanced pattern
const [typeCheck, staticAnalysis] = await ctx.parallel.all([
  () => ctx.task(typeCheckTask, { files: impl.filesModified }),
  () => ctx.task(staticAnalysisTask, { files: impl.filesModified })
]);

const staticGatePassed =
  typeCheck.errors.length === 0 &&
  staticAnalysis.criticalIssues === 0 &&
  staticAnalysis.highIssues === 0;

**Gate Criteria:**

CheckWhat It CatchesTypical Threshold
Type CheckingType mismatches, null errors0 type errors
Static AnalysisPotential bugs, code smells0 critical/high issues
Dead CodeUnreachable statements0 dead code blocks
Null SafetyPotential null dereferences0 null warnings

Gate Type 4: Security Scanning

Identifies vulnerabilities, secrets, and security anti-patterns.

javascript
// Security gate from methodologies/spec-driven-development.js
const security = await ctx.task(securityTask, {
  files: impl.filesModified,
  scanLevel: inputs.safetyLevel // 'standard' | 'high' | 'critical'
});

const securityGatePassed =
  security.criticalVulnerabilities === 0 &&
  security.highVulnerabilities === 0 &&
  security.secretsDetected === 0 &&
  security.dependencyVulnerabilities.critical === 0;

**Gate Criteria:**

CheckWhat It ScansTypical Threshold
SAST (Static)SQL injection, XSS, etc.0 critical/high
Secrets DetectionAPI keys, passwords0 secrets
Dependency ScanKnown CVEs in packages0 critical CVEs
OWASP Top 10Common web vulnerabilities0 violations

Gate Type 5: Performance and Resource Thresholds

Ensures the implementation meets non-functional requirements.

javascript
// Performance gate for production readiness
const performance = await ctx.task(performanceCheckTask, {
  implementation: impl,
  thresholds: {
    loadTimeMs: 1500,      // First Contentful Paint
    bundleSizeKb: 200,     // Gzipped bundle
    apiResponseP95Ms: 500, // 95th percentile
    memoryUsageMb: 512     // Peak memory
  }
});

const performanceGatePassed =
  performance.fcp <= 1500 &&
  performance.bundleSize <= 200 &&
  performance.apiP95 <= 500 &&
  performance.peakMemory <= 512;

**Gate Criteria:**

MetricTypical TargetDomain
FCP (First Contentful Paint)< 1.5sFrontend
Bundle Size< 200KB gzippedFrontend
API p95 Response< 500msBackend
Memory Usage< 512MBServer
CPU Utilization< 70% averageServer

---

The 90-Score Quality Convergence Pattern

To reliably achieve scores of **90+**, implement a **multi-gate weighted scoring system** with iterative feedback.

Step 1: Define Weighted Scoring Dimensions

javascript
// Recommended weights for high-quality convergence
const QUALITY_WEIGHTS = {
  // For production features
  production: {
    tests: 0.25,           // Test coverage and pass rate
    implementation: 0.25,   // Code correctness
    codeQuality: 0.15,      // Lint, complexity, formatting
    security: 0.20,         // Vulnerability scanning
    performance: 0.15       // Non-functional requirements
  },

  // For security-critical systems
  securityCritical: {
    tests: 0.20,
    implementation: 0.20,
    codeQuality: 0.10,
    security: 0.35,         // Higher weight for security
    performance: 0.15
  },

  // For performance-critical systems
  performanceCritical: {
    tests: 0.20,
    implementation: 0.20,
    codeQuality: 0.10,
    security: 0.15,
    performance: 0.35       // Higher weight for performance
  }
};

Step 2: Implement the Multi-Gate Convergence Loop

javascript
/**
 * Multi-gate quality convergence targeting 90+ scores
 * References: gsd/iterative-convergence.js, methodologies/spec-driven-development.js
 */
export async function process(inputs, ctx) {
  const {
    feature,
    targetQuality = 90,      // Target score
    maxIterations = 10,      // Allow more iterations for high targets
    minImprovement = 2,      // Minimum improvement per iteration
    plateauThreshold = 3,    // Iterations without improvement
    weights = QUALITY_WEIGHTS.production
  } = inputs;

  let iteration = 0;
  let quality = 0;
  const iterationHistory = [];

  while (iteration < maxIterations && quality < targetQuality) {
    iteration++;
    ctx.log(`[Iteration ${iteration}/${maxIterations}] Target: ${targetQuality}`);

    // ===== ACT: Implement with feedback from previous iteration =====
    const previousFeedback = iteration > 1
      ? iterationHistory[iteration - 2].recommendations
      : null;

    const impl = await ctx.task(implementTask, {
      feature,
      iteration,
      previousFeedback,
      focusAreas: previousFeedback?.slice(0, 3) // Top 3 priorities
    });

    // ===== VALIDATE: Run all five quality gates in parallel =====
    const [tests, codeQuality, staticAnalysis, security, performance] =
      await ctx.parallel.all([
        () => ctx.task(testGateTask, { impl }),
        () => ctx.task(codeQualityGateTask, { impl }),
        () => ctx.task(staticAnalysisGateTask, { impl }),
        () => ctx.task(securityGateTask, { impl }),
        () => ctx.task(performanceGateTask, { impl })
      ]);

    // ===== SCORE: Calculate weighted quality score =====
    const scores = {
      tests: tests.score,
      implementation: calculateImplementationScore(impl, tests),
      codeQuality: codeQuality.score,
      security: security.score,
      performance: performance.score
    };

    quality = Object.entries(weights).reduce(
      (total, [dimension, weight]) => total + (scores[dimension] * weight),
      0
    );

    // ===== ANALYZE: Generate prioritized recommendations =====
    const recommendations = generateRecommendations(scores, weights, targetQuality);

    iterationHistory.push({
      iteration,
      quality,
      scores,
      recommendations,
      gates: { tests, codeQuality, staticAnalysis, security, performance }
    });

    ctx.log(`Quality: ${quality.toFixed(1)}/${targetQuality} | ` +
            `Tests: ${scores.tests} | Code: ${scores.codeQuality} | ` +
            `Security: ${scores.security} | Perf: ${scores.performance}`);

    // ===== EARLY EXIT: Detect plateau =====
    if (iteration >= plateauThreshold) {
      const recent = iterationHistory.slice(-plateauThreshold).map(r => r.quality);
      const improvement = Math.max(...recent) - Math.min(...recent);
      if (improvement < minImprovement) {
        ctx.log(`Quality plateaued at ${quality.toFixed(1)}, stopping early`);
        break;
      }
    }

    // ===== BREAKPOINT: At key thresholds =====
    const converged = quality >= targetQuality;
    if (!converged && quality >= 80 && iteration > 1) {
      await ctx.breakpoint({
        question: `Quality at ${quality.toFixed(1)}. Continue toward ${targetQuality}?`,
        title: `Iteration ${iteration} Checkpoint`,
        context: {
          runId: ctx.runId,
          files: [{ path: `artifacts/iteration-${iteration}-report.md`, format: 'markdown' }]
        }
      });
    }
  }

  // ===== FINAL VALIDATION =====
  const converged = quality >= targetQuality;

  return {
    success: converged,
    quality,
    targetQuality,
    iterations: iteration,
    iterationHistory,
    finalGates: iterationHistory[iterationHistory.length - 1].gates,
    metadata: { processId: 'quality-convergence-90', timestamp: ctx.now() }
  };
}

function generateRecommendations(scores, weights, target) {
  // Calculate gap for each dimension
  const gaps = Object.entries(scores).map(([dim, score]) => ({
    dimension: dim,
    score,
    weight: weights[dim],
    weightedGap: (100 - score) * weights[dim],
    priority: (100 - score) * weights[dim] // Higher weighted gap = higher priority
  }));

  // Sort by priority (highest impact improvements first)
  return gaps
    .sort((a, b) => b.priority - a.priority)
    .map(g => `Improve ${g.dimension}: currently ${g.score}, ` +
              `contributes ${(g.weight * g.score).toFixed(1)} of ${(g.weight * 100).toFixed(1)} possible`);
}

Step 3: Progressive Target Strategy

For challenging targets (90+), use progressive escalation:

javascript
// Progressive targets that increase as iterations proceed
const progressiveTargets = [
  { iteration: 1, target: 70 },   // First: basic functionality
  { iteration: 3, target: 80 },   // Mid: solid implementation
  { iteration: 5, target: 85 },   // Late: polish and edge cases
  { iteration: 7, target: 90 }    // Final: production ready
];

function getCurrentTarget(iteration, finalTarget) {
  const applicable = progressiveTargets.filter(t => t.iteration <= iteration);
  const progressiveTarget = applicable[applicable.length - 1]?.target || 70;
  return Math.min(progressiveTarget, finalTarget);
}

---

Real-World Process Examples

Example 1: V-Model with Four Test Levels

The V-Model process (methodologies/v-model.js) implements comprehensive quality gates:

Code
/babysitter:call use the V-Model methodology to build a user authentication system with high safety level

Or with more detail:

Code
/babysitter:call implement user authentication using V-Model with traceability and thorough testing

**Quality Gates in V-Model:** 1. Requirements → Acceptance Tests (validates user needs) 2. System Design → System Tests (validates architecture) 3. Module Design → Integration Tests (validates interfaces) 4. Implementation → Unit Tests (validates code) 5. Traceability Matrix (validates coverage)

Example 2: Spec-Kit with Constitution Validation

The Spec-Kit process (methodologies/spec-driven-development.js) adds governance gates:

Code
/babysitter:call use spec-driven development to build PCI-compliant payment processing

Or:

Code
/babysitter:call build a payment flow using the spec-driven methodology with governance validation

**Quality Gates in Spec-Kit:** 1. Constitution Validation (governance principles) 2. Specification Review (requirements completeness) 3. Plan-Constitution Alignment (architecture compliance) 4. Task Consistency Analysis (cross-artifact validation) 5. Implementation Checklists ("unit tests for English") 6. User Story Validation (final acceptance)

Example 3: GSD Iterative Convergence

The GSD process (gsd/iterative-convergence.js) implements feedback-driven convergence:

Code
/babysitter:call build a shopping cart checkout flow with 90% quality target

Or:

Code
/babysitter:call implement checkout flow using iterative convergence with max 8 iterations

**Quality Gates in GSD:** 1. Implementation scoring 2. Test execution 3. Quality assessment with recommendations 4. Iterative feedback loop

---

Use Cases and Scenarios

Scenario 1: TDD Feature Development

Build a feature with test-driven development, iterating until test coverage and quality targets are met.

javascript
export async function process(inputs, ctx) {
  const { feature, targetQuality = 85, maxIterations = 5 } = inputs;

  let iteration = 0;
  let quality = 0;

  while (iteration < maxIterations && quality < targetQuality) {
    iteration++;
    ctx.log(`[Iteration ${iteration}/${maxIterations}] Starting TDD implementation...`);

    // Write tests first
    const tests = await ctx.task(writeTestsTask, { feature, iteration });

    // Implement code to pass tests
    const impl = await ctx.task(implementTask, { tests, feature });

    // Run quality checks
    const [coverage, lint, security] = await ctx.parallel.all([
      () => ctx.task(coverageTask, {}),
      () => ctx.task(lintTask, {}),
      () => ctx.task(securityTask, {})
    ]);

    // Agent scores quality
    const score = await ctx.task(agentScoringTask, {
      tests, impl, coverage, lint, security
    });

    quality = score.overall;
    ctx.log(`Quality score: ${quality}/${targetQuality}`);
  }

  return { converged: quality >= targetQuality, iterations: iteration, quality };
}

Scenario 2: Code Quality Improvement

Iteratively improve existing code until it meets quality standards.

javascript
export async function process(inputs, ctx) {
  const { files, targetScore = 90, maxIterations = 10 } = inputs;

  let iteration = 0;
  let currentScore = 0;

  // Initial assessment
  currentScore = await ctx.task(assessQualityTask, { files });
  ctx.log(`Initial quality score: ${currentScore}`);

  while (iteration < maxIterations && currentScore < targetScore) {
    iteration++;

    // Identify improvements
    const improvements = await ctx.task(identifyImprovementsTask, {
      files,
      currentScore,
      targetScore
    });

    // Apply improvements
    await ctx.task(applyImprovementsTask, { improvements });

    // Re-assess
    currentScore = await ctx.task(assessQualityTask, { files });
    ctx.log(`Iteration ${iteration}: Quality score ${currentScore}/${targetScore}`);
  }

  return { achieved: currentScore >= targetScore, finalScore: currentScore };
}

Scenario 3: Documentation Generation

Generate documentation and refine until it meets completeness standards.

javascript
export async function process(inputs, ctx) {
  const { codebase, targetCompleteness = 80, maxIterations = 3 } = inputs;

  let iteration = 0;
  let completeness = 0;

  while (iteration < maxIterations && completeness < targetCompleteness) {
    iteration++;

    // Generate or improve documentation
    await ctx.task(generateDocsTask, { codebase, iteration });

    // Assess completeness
    const assessment = await ctx.task(assessDocsCompletenessTask, { codebase });
    completeness = assessment.completenessScore;

    ctx.log(`Documentation completeness: ${completeness}%`);
  }

  return { complete: completeness >= targetCompleteness, completeness };
}

---

Step-by-Step Instructions

Step 1: Define Quality Targets

Determine what quality means for your use case.

**Common quality metrics:**

  • Test coverage percentage (e.g., 85%)
  • Lint error count (e.g., 0 errors)
  • Security vulnerability count (e.g., 0 critical)
  • Overall quality score (e.g., 90/100)

Step 2: Set Iteration Limits

Prevent infinite loops by setting a maximum number of iterations.

javascript
const { targetQuality = 85, maxIterations = 5 } = inputs;

**Recommendations:**

  • Simple improvements: 3-5 iterations
  • Complex refactoring: 5-10 iterations
  • Large features: 10-15 iterations

Step 3: Implement the Convergence Loop

Create a loop that continues until the target is met or iterations are exhausted.

javascript
let iteration = 0;
let quality = 0;

while (iteration < maxIterations && quality < targetQuality) {
  iteration++;

  // Perform work
  // ...

  // Measure quality
  quality = await measureQuality();

  ctx.log(`Iteration ${iteration}: ${quality}/${targetQuality}`);
}

Step 4: Implement Quality Scoring

Create a task that evaluates quality based on your criteria.

javascript
export const agentQualityScoringTask = defineTask('quality-scorer', (args, taskCtx) => ({
  kind: 'agent',
  title: 'Score implementation quality',
  agent: {
    name: 'quality-assessor',
    prompt: {
      role: 'senior quality assurance engineer',
      task: 'Analyze implementation quality and provide a score from 0-100',
      context: {
        tests: args.tests,
        implementation: args.implementation,
        coverage: args.coverage,
        lint: args.lint,
        security: args.security
      },
      instructions: [
        'Review test quality (weight: 25%)',
        'Review implementation quality (weight: 30%)',
        'Review code metrics (weight: 20%)',
        'Review security (weight: 15%)',
        'Review alignment with requirements (weight: 10%)',
        'Provide recommendations for improvement'
      ]
    }
  }
}));

Step 5: Add Feedback to Subsequent Iterations

Pass quality feedback to the next iteration to guide improvements.

javascript
const iterationResults = [];

while (iteration < maxIterations && quality < targetQuality) {
  iteration++;

  const previousFeedback = iteration > 1
    ? iterationResults[iteration - 2].recommendations
    : null;

  const impl = await ctx.task(implementTask, {
    feature,
    previousFeedback  // Guide improvements based on previous scoring
  });

  const score = await ctx.task(agentScoringTask, { impl });

  iterationResults.push({
    iteration,
    quality: score.overall,
    recommendations: score.recommendations
  });

  quality = score.overall;
}

---

Configuration Options

Quality Target Configuration

ParameterTypeDefaultDescription
targetQualitynumber85Target quality score (0-100)
maxIterationsnumber5Maximum number of iterations before stopping

Scoring Weights Configuration

Customize how different aspects contribute to the overall score.

javascript
const scoringWeights = {
  tests: 0.25,          // 25% weight for test quality
  implementation: 0.30,  // 30% weight for implementation quality
  codeQuality: 0.20,     // 20% weight for code metrics
  security: 0.15,        // 15% weight for security
  alignment: 0.10        // 10% weight for requirements alignment
};

Early Exit Conditions

Configure conditions that stop iteration early.

javascript
// Stop if quality plateaus (no improvement in last N iterations)
if (qualityHistory.length >= 3) {
  const lastThree = qualityHistory.slice(-3);
  const improvement = lastThree[2] - lastThree[0];
  if (improvement < 1) {
    ctx.log('Quality plateaued, stopping early');
    break;
  }
}

---

Code Examples and Best Practices

Example 1: Full TDD Quality Convergence Process

Complete process definition demonstrating all quality convergence patterns.

javascript
export async function process(inputs, ctx) {
  const {
    feature = 'User authentication',
    targetQuality = 85,
    maxIterations = 5
  } = inputs;

  // Phase 1: Planning
  const plan = await ctx.task(agentPlanningTask, { feature });

  await ctx.breakpoint({
    question: `Review the plan for "${feature}". Approve to proceed?`,
    title: 'Plan Review',
    context: { runId: ctx.runId, files: [{ path: 'artifacts/plan.md', format: 'markdown' }] }
  });

  // Phase 2: Quality Convergence Loop
  let iteration = 0;
  let quality = 0;
  const iterationResults = [];

  while (iteration < maxIterations && quality < targetQuality) {
    iteration++;
    ctx.log(`[Iteration ${iteration}/${maxIterations}]`);

    // TDD: Write tests first
    const tests = await ctx.task(writeTestsTask, {
      feature,
      plan,
      iteration,
      previousFeedback: iteration > 1 ? iterationResults[iteration - 2].feedback : null
    });

    // Run tests (expect failures on first iteration)
    await ctx.task(runTestsTask, { testFiles: tests.testFiles, expectFailures: iteration === 1 });

    // Implement to pass tests
    const impl = await ctx.task(implementTask, {
      feature,
      tests,
      iteration,
      previousFeedback: iteration > 1 ? iterationResults[iteration - 2].feedback : null
    });

    // Run tests again
    const testResults = await ctx.task(runTestsTask, { testFiles: tests.testFiles });

    // Parallel quality checks
    const [coverage, lint, typeCheck, security] = await ctx.parallel.all([
      () => ctx.task(coverageTask, {}),
      () => ctx.task(lintTask, { files: impl.filesModified }),
      () => ctx.task(typeCheckTask, { files: impl.filesModified }),
      () => ctx.task(securityTask, { files: impl.filesModified })
    ]);

    // Agent quality scoring
    const score = await ctx.task(agentQualityScoringTask, {
      tests,
      testResults,
      implementation: impl,
      qualityChecks: { coverage, lint, typeCheck, security },
      iteration,
      targetQuality
    });

    quality = score.overallScore;
    iterationResults.push({
      iteration,
      quality,
      feedback: score.recommendations
    });

    ctx.log(`Quality: ${quality}/${targetQuality}`);

    if (quality >= targetQuality) {
      ctx.log('Target quality achieved!');
    }
  }

  // Final approval
  await ctx.breakpoint({
    question: `Quality: ${quality}/${targetQuality}. Approve for merge?`,
    title: 'Final Review',
    context: { runId: ctx.runId, files: [{ path: 'artifacts/final-report.md', format: 'markdown' }] }
  });

  return {
    success: quality >= targetQuality,
    iterations: iteration,
    finalQuality: quality,
    iterationResults
  };
}

Example 2: Quality Scoring Task Definition

javascript
export const agentQualityScoringTask = defineTask('quality-scorer', (args, taskCtx) => ({
  kind: 'agent',
  title: `Score quality (iteration ${args.iteration})`,
  description: 'Comprehensive quality assessment with agent',

  agent: {
    name: 'quality-assessor',
    prompt: {
      role: 'senior quality assurance engineer and code reviewer',
      task: 'Analyze implementation quality across multiple dimensions',
      context: {
        feature: args.feature,
        tests: args.tests,
        testResults: args.testResults,
        implementation: args.implementation,
        qualityChecks: args.qualityChecks,
        iteration: args.iteration,
        targetQuality: args.targetQuality
      },
      instructions: [
        'Review test quality: coverage, edge cases, assertions (weight: 25%)',
        'Review implementation quality: correctness, readability (weight: 30%)',
        'Review code metrics: lint, types, complexity (weight: 20%)',
        'Review security: vulnerabilities, input validation (weight: 15%)',
        'Review requirements alignment (weight: 10%)',
        'Calculate weighted overall score (0-100)',
        'Provide prioritized recommendations for improvement'
      ],
      outputFormat: 'JSON with overallScore, scores by dimension, recommendations'
    },
    outputSchema: {
      type: 'object',
      required: ['overallScore', 'scores', 'recommendations'],
      properties: {
        overallScore: { type: 'number', minimum: 0, maximum: 100 },
        scores: {
          type: 'object',
          properties: {
            tests: { type: 'number' },
            implementation: { type: 'number' },
            codeQuality: { type: 'number' },
            security: { type: 'number' },
            alignment: { type: 'number' }
          }
        },
        recommendations: { type: 'array', items: { type: 'string' } }
      }
    }
  },

  io: {
    inputJsonPath: `tasks/${taskCtx.effectId}/input.json`,
    outputJsonPath: `tasks/${taskCtx.effectId}/result.json`
  }
}));

Best Practices

1. **Set Realistic Targets**: Aim for achievable quality scores (80-90% is often reasonable) 2. **Limit Iterations**: Prevent runaway loops with sensible limits (5-10 iterations typically) 3. **Use Parallel Checks**: Run independent quality checks concurrently for efficiency 4. **Provide Feedback**: Pass recommendations from scoring to subsequent iterations 5. **Log Progress**: Track quality scores across iterations for visibility 6. **Include Breakpoints**: Add approval gates at key milestones

---

Common Pitfalls and Troubleshooting

Pitfall 1: Quality Score Not Improving

**Symptom:**

Code
Iteration 1: Quality 65/100
Iteration 2: Quality 66/100
Iteration 3: Quality 65/100
Iteration 4: Quality 67/100
Iteration 5: Quality 66/100
Target not met: 85/100

**Causes:**

  • Quality target is unrealistic for the codebase
  • Scoring criteria are too strict
  • Fundamental issues blocking improvement

**Solutions:**

1. Review iteration feedback to identify blocking issues: `` What recommendations came from my quality scoring? ``

2. Adjust quality target: ``javascript const { targetQuality = 75 } = inputs; // Lower target ``

3. Increase iteration limit: ``javascript const { maxIterations = 10 } = inputs; // More iterations ``

4. Review scoring weights for balance

Pitfall 2: Too Many Iterations

**Symptom:** Process runs for many iterations before converging.

**Cause:** Target is too high or improvements are too granular.

**Solutions:**

1. Implement early exit on plateau: ``javascript const recentScores = iterationResults.slice(-3).map(r => r.quality); if (Math.max(...recentScores) - Math.min(...recentScores) < 2) { ctx.log('Quality plateaued, stopping early'); break; } ``

2. Increase improvement scope per iteration

3. Lower quality target to realistic level

Pitfall 3: Inconsistent Quality Scores

**Symptom:** Quality scores vary significantly between iterations without clear reason.

**Cause:** Non-deterministic scoring or external factors.

**Solution:**

1. Use deterministic scoring criteria 2. Ensure ctx.now() is used instead of Date.now() for timestamps 3. Review agent scoring prompts for consistency

Pitfall 4: Iteration Takes Too Long

**Symptom:** Each iteration takes several minutes.

**Cause:** Sequential execution of independent tasks.

**Solution:** Use parallel execution:

javascript
// Slow: Sequential
const coverage = await ctx.task(coverageTask, {});
const lint = await ctx.task(lintTask, {});
const security = await ctx.task(securityTask, {});

// Fast: Parallel
const [coverage, lint, security] = await ctx.parallel.all([
  () => ctx.task(coverageTask, {}),
  () => ctx.task(lintTask, {}),
  () => ctx.task(securityTask, {})
]);

---

Related Documentation

  • Process Definitions - Learn to create quality convergence processes
  • Parallel Execution - Optimize quality checks with parallelism
  • Breakpoints - Add approval gates to quality convergence workflows
  • Best Practices - Patterns for setting targets, custom scoring strategies, and balancing speed vs thoroughness
  • Process Library - Browse the SDK-managed library and current process counts
  • Two-Loops Architecture - Deep dive into the evidence-driven completion model

---

Try Different Methodologies and Processes

Babysitter offers two levels of reusable workflows:

Methodologies (38 directories in this repo snapshot) - The "How"

**Quality convergence works with ANY of Babysitter's methodology families** - not just TDD. In this repository snapshot there are 38 methodology directories under library/methodologies/.

MethodologyBest ForQuality Focus
**TDD Quality Convergence**Test-first developmentTest coverage, regression prevention
**GSD (Get Stuff Done)**Rapid prototypingWorking software, iteration speed
**Spec-Kit**Enterprise/governanceSpecification compliance, audit trails
**BDD/Specification by Example**Team collaborationAcceptance criteria, living documentation
**Domain-Driven Design**Complex business domainsDomain model integrity, bounded contexts

**Browse methodologies:**

  • Methodology overview
  • Methodologies folder

Domain Processes - The "What"

Beyond methodologies, Babysitter includes the following generated specialization snapshot from the live repository tree:

<!-- quality-convergence:domains:start -->

DomainProcessesExamples
**Development and technical specializations**837Web APIs, mobile apps, DevOps pipelines, AI, security, and related technical workflows
**Business domains**490Legal contracts, HR workflows, marketing campaigns, finance, logistics, and related domains
**Science & engineering domains**551Quantum algorithms, aerospace systems, biomedical devices, mathematics, and related domains
**Social sciences & humanities**160Education, healthcare, arts, philosophy, and social-science research

<!-- quality-convergence:domains:end -->

**Browse processes:**

  • Process Library - Full catalog with descriptions
  • Specializations folder

---

What To Do Next

Your GoalNext Step
Run a quality convergence workflowTry /babysitter:call build a feature with 85% quality target
Build your own convergence loopCopy the TDD example above and customize the scoring
Add more quality gatesSee the Five Quality Gate Categories section
Debug a stuck convergenceCheck Best Practices - Debugging
Understand the architectureRead Two-Loops Architecture

---

Summary

Quality convergence enables automated iterative improvement until defined quality targets are met. Combine quality scoring, feedback loops, and sensible iteration limits to ensure consistent, high-quality outputs. Use parallel execution for efficiency and breakpoints for human oversight at critical milestones.

**Key Takeaways:**

1. **Set realistic targets** - Start with 80-85, work up to 90+ 2. **Use multiple gate types** - Tests + lint + security + performance 3. **Pass feedback between iterations** - AI learns from each failure 4. **Detect plateaus early** - Don't waste iterations on no improvement 5. **Parallelize independent checks** - Faster iterations mean faster convergence

Trail

Wiki
Babysitter Docs
Babysitter User Guide
Features

Quality Convergence: Iterative Improvement Until Targets Met

Continue reading

Architecture Overview
Best Practices Guide: Comprehensive Reference for Babysitter
Breakpoints: Human-in-the-Loop Approval
Hooks: Extensible Lifecycle Events
Journal System: Event Sourcing and Audit Trail
Parallel Execution: Running Tasks Concurrently
Process Definitions: JavaScript Workflow Orchestration
Process Library

Page record

Open node ledger

wiki/docs/user-guide/features/quality-convergence.md

Documents

No documented graph nodes on this page.