Agentic AI Atlasby a5c.ai
OverviewWikiGraphFor AgentsEdgesSearchWorkspace
/
GitHubDocsDiscord
iiRecord
Agentic AI Atlas · Quality Convergence: Iterative Improvement Until Targets Met
page:docs-user-guide-features-quality-convergencea5c.ai
Search record views/
Record · tabs

Available views

II.Record viewspp. 1 - 1
overviewarticlejsongraph
II.
Page JSON

page:docs-user-guide-features-quality-convergence

Structured · live

Quality Convergence: Iterative Improvement Until Targets Met json

Inspect the normalized record payload exactly as the atlas UI reads it.

File · wiki/docs/user-guide/features/quality-convergence.mdCluster · wiki
Record JSON
{
  "id": "page:docs-user-guide-features-quality-convergence",
  "_kind": "Page",
  "_file": "wiki/docs/user-guide/features/quality-convergence.md",
  "_cluster": "wiki",
  "attributes": {
    "nodeKind": "Page",
    "sourcePath": "docs/user-guide/features/quality-convergence.md",
    "sourceKind": "repo-docs",
    "title": "Quality Convergence: Iterative Improvement Until Targets Met",
    "displayName": "Quality Convergence: Iterative Improvement Until Targets Met",
    "slug": "docs/user-guide/features/quality-convergence",
    "articlePath": "wiki/docs/user-guide/features/quality-convergence.md",
    "article": "\n# Quality Convergence: Iterative Improvement Until Targets Met\n\n**Version:** 2.1\n**Last Updated:** 2026-01-26\n**Category:** Feature Guide\n\n---\n\n## Quick Summary (Read This First)\n\n**Quality Convergence = \"Keep trying until it's good enough\"**\n\nInstead of:\n```\nAI writes code → Tests fail → You manually fix → Tests fail again → Repeat 10x\n```\n\nBabysitter does:\n```\nAI writes code → Tests: 60% pass → AI fixes → Tests: 85% pass → AI fixes → Tests: 95% pass ✓ Done!\n```\n\n### What You'll Learn in This Document\n\n| Section | What It Covers | Read If You Want To... |\n|---------|----------------|------------------------|\n| Five Quality Gates | Types of checks (tests, lint, security, etc.) | Understand what gets checked |\n| 90-Score Pattern | How to reliably hit high quality | Build production-ready workflows |\n| Process Examples | Real code from the library | See working implementations |\n| Step-by-Step | How to build your own | Create custom quality loops |\n\n### A Simple Example\n\nHere's what quality convergence looks like in practice:\n\n```\nIteration 1:\n  - AI writes login feature\n  - Tests run: 3/10 passing (30%)\n  - AI sees: \"Missing password validation, no error handling\"\n\nIteration 2:\n  - AI fixes based on feedback\n  - Tests run: 7/10 passing (70%)\n  - AI sees: \"Edge case for empty email not handled\"\n\nIteration 3:\n  - AI fixes edge cases\n  - Tests run: 10/10 passing (100%)\n  - Quality target met! ✓\n\nOutput: Working login feature with all tests passing\n```\n\n**Key insight**: The AI doesn't just try once - it learns from each failure and improves.\n\n### Understanding Quality Scores\n\n**Quality scores are multi-dimensional, not a single number.** This is what makes Babysitter's quality convergence so accurate - instead of a simple pass/fail, you get nuanced feedback across multiple dimensions that guide improvement.\n\nA typical quality score includes:\n\n| Dimension | What It Measures | Example |\n|-----------|------------------|---------|\n| **Tests** | Pass rate and coverage | 92% tests passing, 85% coverage |\n| **Code Quality** | Lint errors, complexity | 0 lint errors, complexity < 10 |\n| **Security** | Vulnerabilities, secrets | 0 critical issues |\n| **Performance** | Response time, bundle size | p95 < 500ms |\n| **Type Safety** | Type errors, null safety | 0 type errors |\n\n### The Power of Custom Dimensions\n\n**You define what quality means for your project.** The dimensions above are just examples - you can:\n\n1. **Define your own 5 dimensions** that matter most for your domain\n2. **Ask Babysitter to suggest dimensions** appropriate for your specific task\n3. **Weight dimensions differently** based on project phase or criticality\n\nFor example, a data pipeline might use completely different dimensions:\n\n| Dimension | Weight | Threshold |\n|-----------|--------|-----------|\n| **Data Accuracy** | 30% | > 99.9% |\n| **Processing Speed** | 25% | < 5 min/GB |\n| **Schema Validation** | 20% | 100% valid |\n| **Idempotency** | 15% | All operations idempotent |\n| **Error Recovery** | 10% | Auto-recovery < 30s |\n\nThis flexibility means quality convergence adapts to any domain - from ML model training to infrastructure deployment to documentation generation.\n\n**For detailed scoring formulas and weight configurations, see [Best Practices - Custom Scoring Strategies](./best-practices.md#custom-scoring-strategies).**\n\n---\n\n## Overview\n\nQuality convergence is an iterative improvement pattern where Babysitter repeatedly refines work until a defined quality target is achieved. Instead of executing a task once and hoping for the best, quality convergence loops through implementation, testing, and scoring cycles until the output meets your standards.\n\n### The Core Principle: Evidence-Driven Completion\n\nFrom the [Two-Loops Control Plane architecture](./two-loops-architecture.md), the fundamental principle is:\n\n> **If you don't have evidence, you don't have completion.**\n\n*If you do only one thing: make completion require evidence.* — This single principle transforms \"it seems done\" into \"it is done.\"\n\nEvery phase must end with:\n- **Artifact**: The work product (patch, doc, config, report)\n- **Evidence**: Proof that it meets requirements (logs, test output, checks)\n\n### Why Use Quality Convergence\n\n- **Consistent Quality**: Guarantee outputs meet minimum quality thresholds\n- **Automated Refinement**: Let the system iterate without manual intervention\n- **Measurable Results**: Track quality scores across iterations\n- **Predictable Outcomes**: Set clear targets and iteration limits\n- **TDD Integration**: Combine with test-driven development for robust code\n- **Evidence-Based Completion**: Every iteration produces verifiable proof of quality\n\n---\n\n## The Five Quality Gate Categories\n\nQuality gates are not a single check. They form a **layered validation system** that ensures completeness from multiple perspectives. For robust quality convergence, use **4-5 gate types simultaneously**.\n\n### Gate Type 1: Functional Tests (Unit/Integration/System/Acceptance)\n\nVerifies the code behaves correctly across all levels.\n\n```javascript\n// From: methodologies/v-model.js (V-Model process)\nconst testResults = await ctx.task(executeTestsTask, {\n  implementation,\n  unitTestDesigns,      // Validates module design\n  integrationTestDesign, // Validates architecture\n  systemTestDesign,      // Validates system design\n  acceptanceTestDesign   // Validates requirements\n});\n\nconst allTestsPassed =\n  testResults.unitTests.passed &&\n  testResults.integrationTests.passed &&\n  testResults.systemTests.passed &&\n  testResults.acceptanceTests.passed;\n```\n\n**Gate Criteria:**\n\n| Test Level | What It Validates | Typical Pass Threshold |\n|------------|-------------------|------------------------|\n| Unit Tests | Individual functions/classes | 90-100% pass rate |\n| Integration Tests | Module interactions | 95-100% pass rate |\n| System Tests | End-to-end behavior | 90-100% pass rate |\n| Acceptance Tests | User requirements | 100% for critical |\n\n### Gate Type 2: Code Quality (Lint/Format/Complexity)\n\nEnsures code follows style guidelines and maintainability standards.\n\n```javascript\n// Parallel code quality checks\nconst [lint, format, complexity] = await ctx.parallel.all([\n  () => ctx.task(lintTask, { files: impl.filesModified }),\n  () => ctx.task(formatCheckTask, { files: impl.filesModified }),\n  () => ctx.task(complexityTask, { files: impl.filesModified })\n]);\n\nconst codeQualityGatePassed =\n  lint.errorCount === 0 &&\n  format.violations === 0 &&\n  complexity.maxCyclomaticComplexity < 10;\n```\n\n**Gate Criteria:**\n\n| Check | Tool Examples | Typical Threshold |\n|-------|---------------|-------------------|\n| Lint Errors | ESLint, Pylint | 0 errors |\n| Formatting | Prettier, Black | 0 violations |\n| Cyclomatic Complexity | SonarQube, Radon | < 10 per function |\n| Code Duplication | jscpd, CPD | < 3% duplication |\n\n### Gate Type 3: Type Safety and Static Analysis\n\nCatches bugs at compile/analysis time without running the code.\n\n```javascript\n// From: gsd/iterative-convergence enhanced pattern\nconst [typeCheck, staticAnalysis] = await ctx.parallel.all([\n  () => ctx.task(typeCheckTask, { files: impl.filesModified }),\n  () => ctx.task(staticAnalysisTask, { files: impl.filesModified })\n]);\n\nconst staticGatePassed =\n  typeCheck.errors.length === 0 &&\n  staticAnalysis.criticalIssues === 0 &&\n  staticAnalysis.highIssues === 0;\n```\n\n**Gate Criteria:**\n\n| Check | What It Catches | Typical Threshold |\n|-------|-----------------|-------------------|\n| Type Checking | Type mismatches, null errors | 0 type errors |\n| Static Analysis | Potential bugs, code smells | 0 critical/high issues |\n| Dead Code | Unreachable statements | 0 dead code blocks |\n| Null Safety | Potential null dereferences | 0 null warnings |\n\n### Gate Type 4: Security Scanning\n\nIdentifies vulnerabilities, secrets, and security anti-patterns.\n\n```javascript\n// Security gate from methodologies/spec-driven-development.js\nconst security = await ctx.task(securityTask, {\n  files: impl.filesModified,\n  scanLevel: inputs.safetyLevel // 'standard' | 'high' | 'critical'\n});\n\nconst securityGatePassed =\n  security.criticalVulnerabilities === 0 &&\n  security.highVulnerabilities === 0 &&\n  security.secretsDetected === 0 &&\n  security.dependencyVulnerabilities.critical === 0;\n```\n\n**Gate Criteria:**\n\n| Check | What It Scans | Typical Threshold |\n|-------|---------------|-------------------|\n| SAST (Static) | SQL injection, XSS, etc. | 0 critical/high |\n| Secrets Detection | API keys, passwords | 0 secrets |\n| Dependency Scan | Known CVEs in packages | 0 critical CVEs |\n| OWASP Top 10 | Common web vulnerabilities | 0 violations |\n\n### Gate Type 5: Performance and Resource Thresholds\n\nEnsures the implementation meets non-functional requirements.\n\n```javascript\n// Performance gate for production readiness\nconst performance = await ctx.task(performanceCheckTask, {\n  implementation: impl,\n  thresholds: {\n    loadTimeMs: 1500,      // First Contentful Paint\n    bundleSizeKb: 200,     // Gzipped bundle\n    apiResponseP95Ms: 500, // 95th percentile\n    memoryUsageMb: 512     // Peak memory\n  }\n});\n\nconst performanceGatePassed =\n  performance.fcp <= 1500 &&\n  performance.bundleSize <= 200 &&\n  performance.apiP95 <= 500 &&\n  performance.peakMemory <= 512;\n```\n\n**Gate Criteria:**\n\n| Metric | Typical Target | Domain |\n|--------|----------------|--------|\n| FCP (First Contentful Paint) | < 1.5s | Frontend |\n| Bundle Size | < 200KB gzipped | Frontend |\n| API p95 Response | < 500ms | Backend |\n| Memory Usage | < 512MB | Server |\n| CPU Utilization | < 70% average | Server |\n\n---\n\n## The 90-Score Quality Convergence Pattern\n\nTo reliably achieve scores of **90+**, implement a **multi-gate weighted scoring system** with iterative feedback.\n\n### Step 1: Define Weighted Scoring Dimensions\n\n```javascript\n// Recommended weights for high-quality convergence\nconst QUALITY_WEIGHTS = {\n  // For production features\n  production: {\n    tests: 0.25,           // Test coverage and pass rate\n    implementation: 0.25,   // Code correctness\n    codeQuality: 0.15,      // Lint, complexity, formatting\n    security: 0.20,         // Vulnerability scanning\n    performance: 0.15       // Non-functional requirements\n  },\n\n  // For security-critical systems\n  securityCritical: {\n    tests: 0.20,\n    implementation: 0.20,\n    codeQuality: 0.10,\n    security: 0.35,         // Higher weight for security\n    performance: 0.15\n  },\n\n  // For performance-critical systems\n  performanceCritical: {\n    tests: 0.20,\n    implementation: 0.20,\n    codeQuality: 0.10,\n    security: 0.15,\n    performance: 0.35       // Higher weight for performance\n  }\n};\n```\n\n### Step 2: Implement the Multi-Gate Convergence Loop\n\n```javascript\n/**\n * Multi-gate quality convergence targeting 90+ scores\n * References: gsd/iterative-convergence.js, methodologies/spec-driven-development.js\n */\nexport async function process(inputs, ctx) {\n  const {\n    feature,\n    targetQuality = 90,      // Target score\n    maxIterations = 10,      // Allow more iterations for high targets\n    minImprovement = 2,      // Minimum improvement per iteration\n    plateauThreshold = 3,    // Iterations without improvement\n    weights = QUALITY_WEIGHTS.production\n  } = inputs;\n\n  let iteration = 0;\n  let quality = 0;\n  const iterationHistory = [];\n\n  while (iteration < maxIterations && quality < targetQuality) {\n    iteration++;\n    ctx.log(`[Iteration ${iteration}/${maxIterations}] Target: ${targetQuality}`);\n\n    // ===== ACT: Implement with feedback from previous iteration =====\n    const previousFeedback = iteration > 1\n      ? iterationHistory[iteration - 2].recommendations\n      : null;\n\n    const impl = await ctx.task(implementTask, {\n      feature,\n      iteration,\n      previousFeedback,\n      focusAreas: previousFeedback?.slice(0, 3) // Top 3 priorities\n    });\n\n    // ===== VALIDATE: Run all five quality gates in parallel =====\n    const [tests, codeQuality, staticAnalysis, security, performance] =\n      await ctx.parallel.all([\n        () => ctx.task(testGateTask, { impl }),\n        () => ctx.task(codeQualityGateTask, { impl }),\n        () => ctx.task(staticAnalysisGateTask, { impl }),\n        () => ctx.task(securityGateTask, { impl }),\n        () => ctx.task(performanceGateTask, { impl })\n      ]);\n\n    // ===== SCORE: Calculate weighted quality score =====\n    const scores = {\n      tests: tests.score,\n      implementation: calculateImplementationScore(impl, tests),\n      codeQuality: codeQuality.score,\n      security: security.score,\n      performance: performance.score\n    };\n\n    quality = Object.entries(weights).reduce(\n      (total, [dimension, weight]) => total + (scores[dimension] * weight),\n      0\n    );\n\n    // ===== ANALYZE: Generate prioritized recommendations =====\n    const recommendations = generateRecommendations(scores, weights, targetQuality);\n\n    iterationHistory.push({\n      iteration,\n      quality,\n      scores,\n      recommendations,\n      gates: { tests, codeQuality, staticAnalysis, security, performance }\n    });\n\n    ctx.log(`Quality: ${quality.toFixed(1)}/${targetQuality} | ` +\n            `Tests: ${scores.tests} | Code: ${scores.codeQuality} | ` +\n            `Security: ${scores.security} | Perf: ${scores.performance}`);\n\n    // ===== EARLY EXIT: Detect plateau =====\n    if (iteration >= plateauThreshold) {\n      const recent = iterationHistory.slice(-plateauThreshold).map(r => r.quality);\n      const improvement = Math.max(...recent) - Math.min(...recent);\n      if (improvement < minImprovement) {\n        ctx.log(`Quality plateaued at ${quality.toFixed(1)}, stopping early`);\n        break;\n      }\n    }\n\n    // ===== BREAKPOINT: At key thresholds =====\n    const converged = quality >= targetQuality;\n    if (!converged && quality >= 80 && iteration > 1) {\n      await ctx.breakpoint({\n        question: `Quality at ${quality.toFixed(1)}. Continue toward ${targetQuality}?`,\n        title: `Iteration ${iteration} Checkpoint`,\n        context: {\n          runId: ctx.runId,\n          files: [{ path: `artifacts/iteration-${iteration}-report.md`, format: 'markdown' }]\n        }\n      });\n    }\n  }\n\n  // ===== FINAL VALIDATION =====\n  const converged = quality >= targetQuality;\n\n  return {\n    success: converged,\n    quality,\n    targetQuality,\n    iterations: iteration,\n    iterationHistory,\n    finalGates: iterationHistory[iterationHistory.length - 1].gates,\n    metadata: { processId: 'quality-convergence-90', timestamp: ctx.now() }\n  };\n}\n\nfunction generateRecommendations(scores, weights, target) {\n  // Calculate gap for each dimension\n  const gaps = Object.entries(scores).map(([dim, score]) => ({\n    dimension: dim,\n    score,\n    weight: weights[dim],\n    weightedGap: (100 - score) * weights[dim],\n    priority: (100 - score) * weights[dim] // Higher weighted gap = higher priority\n  }));\n\n  // Sort by priority (highest impact improvements first)\n  return gaps\n    .sort((a, b) => b.priority - a.priority)\n    .map(g => `Improve ${g.dimension}: currently ${g.score}, ` +\n              `contributes ${(g.weight * g.score).toFixed(1)} of ${(g.weight * 100).toFixed(1)} possible`);\n}\n```\n\n### Step 3: Progressive Target Strategy\n\nFor challenging targets (90+), use progressive escalation:\n\n```javascript\n// Progressive targets that increase as iterations proceed\nconst progressiveTargets = [\n  { iteration: 1, target: 70 },   // First: basic functionality\n  { iteration: 3, target: 80 },   // Mid: solid implementation\n  { iteration: 5, target: 85 },   // Late: polish and edge cases\n  { iteration: 7, target: 90 }    // Final: production ready\n];\n\nfunction getCurrentTarget(iteration, finalTarget) {\n  const applicable = progressiveTargets.filter(t => t.iteration <= iteration);\n  const progressiveTarget = applicable[applicable.length - 1]?.target || 70;\n  return Math.min(progressiveTarget, finalTarget);\n}\n```\n\n---\n\n## Real-World Process Examples\n\n### Example 1: V-Model with Four Test Levels\n\nThe V-Model process (`methodologies/v-model.js`) implements comprehensive quality gates:\n\n```\n/babysitter:call use the V-Model methodology to build a user authentication system with high safety level\n```\n\nOr with more detail:\n```\n/babysitter:call implement user authentication using V-Model with traceability and thorough testing\n```\n\n**Quality Gates in V-Model:**\n1. Requirements → Acceptance Tests (validates user needs)\n2. System Design → System Tests (validates architecture)\n3. Module Design → Integration Tests (validates interfaces)\n4. Implementation → Unit Tests (validates code)\n5. Traceability Matrix (validates coverage)\n\n### Example 2: Spec-Kit with Constitution Validation\n\nThe Spec-Kit process (`methodologies/spec-driven-development.js`) adds governance gates:\n\n```\n/babysitter:call use spec-driven development to build PCI-compliant payment processing\n```\n\nOr:\n```\n/babysitter:call build a payment flow using the spec-driven methodology with governance validation\n```\n\n**Quality Gates in Spec-Kit:**\n1. Constitution Validation (governance principles)\n2. Specification Review (requirements completeness)\n3. Plan-Constitution Alignment (architecture compliance)\n4. Task Consistency Analysis (cross-artifact validation)\n5. Implementation Checklists (\"unit tests for English\")\n6. User Story Validation (final acceptance)\n\n### Example 3: GSD Iterative Convergence\n\nThe GSD process (`gsd/iterative-convergence.js`) implements feedback-driven convergence:\n\n```\n/babysitter:call build a shopping cart checkout flow with 90% quality target\n```\n\nOr:\n```\n/babysitter:call implement checkout flow using iterative convergence with max 8 iterations\n```\n\n**Quality Gates in GSD:**\n1. Implementation scoring\n2. Test execution\n3. Quality assessment with recommendations\n4. Iterative feedback loop\n\n---\n\n## Use Cases and Scenarios\n\n### Scenario 1: TDD Feature Development\n\nBuild a feature with test-driven development, iterating until test coverage and quality targets are met.\n\n```javascript\nexport async function process(inputs, ctx) {\n  const { feature, targetQuality = 85, maxIterations = 5 } = inputs;\n\n  let iteration = 0;\n  let quality = 0;\n\n  while (iteration < maxIterations && quality < targetQuality) {\n    iteration++;\n    ctx.log(`[Iteration ${iteration}/${maxIterations}] Starting TDD implementation...`);\n\n    // Write tests first\n    const tests = await ctx.task(writeTestsTask, { feature, iteration });\n\n    // Implement code to pass tests\n    const impl = await ctx.task(implementTask, { tests, feature });\n\n    // Run quality checks\n    const [coverage, lint, security] = await ctx.parallel.all([\n      () => ctx.task(coverageTask, {}),\n      () => ctx.task(lintTask, {}),\n      () => ctx.task(securityTask, {})\n    ]);\n\n    // Agent scores quality\n    const score = await ctx.task(agentScoringTask, {\n      tests, impl, coverage, lint, security\n    });\n\n    quality = score.overall;\n    ctx.log(`Quality score: ${quality}/${targetQuality}`);\n  }\n\n  return { converged: quality >= targetQuality, iterations: iteration, quality };\n}\n```\n\n### Scenario 2: Code Quality Improvement\n\nIteratively improve existing code until it meets quality standards.\n\n```javascript\nexport async function process(inputs, ctx) {\n  const { files, targetScore = 90, maxIterations = 10 } = inputs;\n\n  let iteration = 0;\n  let currentScore = 0;\n\n  // Initial assessment\n  currentScore = await ctx.task(assessQualityTask, { files });\n  ctx.log(`Initial quality score: ${currentScore}`);\n\n  while (iteration < maxIterations && currentScore < targetScore) {\n    iteration++;\n\n    // Identify improvements\n    const improvements = await ctx.task(identifyImprovementsTask, {\n      files,\n      currentScore,\n      targetScore\n    });\n\n    // Apply improvements\n    await ctx.task(applyImprovementsTask, { improvements });\n\n    // Re-assess\n    currentScore = await ctx.task(assessQualityTask, { files });\n    ctx.log(`Iteration ${iteration}: Quality score ${currentScore}/${targetScore}`);\n  }\n\n  return { achieved: currentScore >= targetScore, finalScore: currentScore };\n}\n```\n\n### Scenario 3: Documentation Generation\n\nGenerate documentation and refine until it meets completeness standards.\n\n```javascript\nexport async function process(inputs, ctx) {\n  const { codebase, targetCompleteness = 80, maxIterations = 3 } = inputs;\n\n  let iteration = 0;\n  let completeness = 0;\n\n  while (iteration < maxIterations && completeness < targetCompleteness) {\n    iteration++;\n\n    // Generate or improve documentation\n    await ctx.task(generateDocsTask, { codebase, iteration });\n\n    // Assess completeness\n    const assessment = await ctx.task(assessDocsCompletenessTask, { codebase });\n    completeness = assessment.completenessScore;\n\n    ctx.log(`Documentation completeness: ${completeness}%`);\n  }\n\n  return { complete: completeness >= targetCompleteness, completeness };\n}\n```\n\n---\n\n## Step-by-Step Instructions\n\n### Step 1: Define Quality Targets\n\nDetermine what quality means for your use case.\n\n**Common quality metrics:**\n- Test coverage percentage (e.g., 85%)\n- Lint error count (e.g., 0 errors)\n- Security vulnerability count (e.g., 0 critical)\n- Overall quality score (e.g., 90/100)\n\n### Step 2: Set Iteration Limits\n\nPrevent infinite loops by setting a maximum number of iterations.\n\n```javascript\nconst { targetQuality = 85, maxIterations = 5 } = inputs;\n```\n\n**Recommendations:**\n- Simple improvements: 3-5 iterations\n- Complex refactoring: 5-10 iterations\n- Large features: 10-15 iterations\n\n### Step 3: Implement the Convergence Loop\n\nCreate a loop that continues until the target is met or iterations are exhausted.\n\n```javascript\nlet iteration = 0;\nlet quality = 0;\n\nwhile (iteration < maxIterations && quality < targetQuality) {\n  iteration++;\n\n  // Perform work\n  // ...\n\n  // Measure quality\n  quality = await measureQuality();\n\n  ctx.log(`Iteration ${iteration}: ${quality}/${targetQuality}`);\n}\n```\n\n### Step 4: Implement Quality Scoring\n\nCreate a task that evaluates quality based on your criteria.\n\n```javascript\nexport const agentQualityScoringTask = defineTask('quality-scorer', (args, taskCtx) => ({\n  kind: 'agent',\n  title: 'Score implementation quality',\n  agent: {\n    name: 'quality-assessor',\n    prompt: {\n      role: 'senior quality assurance engineer',\n      task: 'Analyze implementation quality and provide a score from 0-100',\n      context: {\n        tests: args.tests,\n        implementation: args.implementation,\n        coverage: args.coverage,\n        lint: args.lint,\n        security: args.security\n      },\n      instructions: [\n        'Review test quality (weight: 25%)',\n        'Review implementation quality (weight: 30%)',\n        'Review code metrics (weight: 20%)',\n        'Review security (weight: 15%)',\n        'Review alignment with requirements (weight: 10%)',\n        'Provide recommendations for improvement'\n      ]\n    }\n  }\n}));\n```\n\n### Step 5: Add Feedback to Subsequent Iterations\n\nPass quality feedback to the next iteration to guide improvements.\n\n```javascript\nconst iterationResults = [];\n\nwhile (iteration < maxIterations && quality < targetQuality) {\n  iteration++;\n\n  const previousFeedback = iteration > 1\n    ? iterationResults[iteration - 2].recommendations\n    : null;\n\n  const impl = await ctx.task(implementTask, {\n    feature,\n    previousFeedback  // Guide improvements based on previous scoring\n  });\n\n  const score = await ctx.task(agentScoringTask, { impl });\n\n  iterationResults.push({\n    iteration,\n    quality: score.overall,\n    recommendations: score.recommendations\n  });\n\n  quality = score.overall;\n}\n```\n\n---\n\n## Configuration Options\n\n### Quality Target Configuration\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `targetQuality` | number | 85 | Target quality score (0-100) |\n| `maxIterations` | number | 5 | Maximum number of iterations before stopping |\n\n### Scoring Weights Configuration\n\nCustomize how different aspects contribute to the overall score.\n\n```javascript\nconst scoringWeights = {\n  tests: 0.25,          // 25% weight for test quality\n  implementation: 0.30,  // 30% weight for implementation quality\n  codeQuality: 0.20,     // 20% weight for code metrics\n  security: 0.15,        // 15% weight for security\n  alignment: 0.10        // 10% weight for requirements alignment\n};\n```\n\n### Early Exit Conditions\n\nConfigure conditions that stop iteration early.\n\n```javascript\n// Stop if quality plateaus (no improvement in last N iterations)\nif (qualityHistory.length >= 3) {\n  const lastThree = qualityHistory.slice(-3);\n  const improvement = lastThree[2] - lastThree[0];\n  if (improvement < 1) {\n    ctx.log('Quality plateaued, stopping early');\n    break;\n  }\n}\n```\n\n---\n\n## Code Examples and Best Practices\n\n### Example 1: Full TDD Quality Convergence Process\n\nComplete process definition demonstrating all quality convergence patterns.\n\n```javascript\nexport async function process(inputs, ctx) {\n  const {\n    feature = 'User authentication',\n    targetQuality = 85,\n    maxIterations = 5\n  } = inputs;\n\n  // Phase 1: Planning\n  const plan = await ctx.task(agentPlanningTask, { feature });\n\n  await ctx.breakpoint({\n    question: `Review the plan for \"${feature}\". Approve to proceed?`,\n    title: 'Plan Review',\n    context: { runId: ctx.runId, files: [{ path: 'artifacts/plan.md', format: 'markdown' }] }\n  });\n\n  // Phase 2: Quality Convergence Loop\n  let iteration = 0;\n  let quality = 0;\n  const iterationResults = [];\n\n  while (iteration < maxIterations && quality < targetQuality) {\n    iteration++;\n    ctx.log(`[Iteration ${iteration}/${maxIterations}]`);\n\n    // TDD: Write tests first\n    const tests = await ctx.task(writeTestsTask, {\n      feature,\n      plan,\n      iteration,\n      previousFeedback: iteration > 1 ? iterationResults[iteration - 2].feedback : null\n    });\n\n    // Run tests (expect failures on first iteration)\n    await ctx.task(runTestsTask, { testFiles: tests.testFiles, expectFailures: iteration === 1 });\n\n    // Implement to pass tests\n    const impl = await ctx.task(implementTask, {\n      feature,\n      tests,\n      iteration,\n      previousFeedback: iteration > 1 ? iterationResults[iteration - 2].feedback : null\n    });\n\n    // Run tests again\n    const testResults = await ctx.task(runTestsTask, { testFiles: tests.testFiles });\n\n    // Parallel quality checks\n    const [coverage, lint, typeCheck, security] = await ctx.parallel.all([\n      () => ctx.task(coverageTask, {}),\n      () => ctx.task(lintTask, { files: impl.filesModified }),\n      () => ctx.task(typeCheckTask, { files: impl.filesModified }),\n      () => ctx.task(securityTask, { files: impl.filesModified })\n    ]);\n\n    // Agent quality scoring\n    const score = await ctx.task(agentQualityScoringTask, {\n      tests,\n      testResults,\n      implementation: impl,\n      qualityChecks: { coverage, lint, typeCheck, security },\n      iteration,\n      targetQuality\n    });\n\n    quality = score.overallScore;\n    iterationResults.push({\n      iteration,\n      quality,\n      feedback: score.recommendations\n    });\n\n    ctx.log(`Quality: ${quality}/${targetQuality}`);\n\n    if (quality >= targetQuality) {\n      ctx.log('Target quality achieved!');\n    }\n  }\n\n  // Final approval\n  await ctx.breakpoint({\n    question: `Quality: ${quality}/${targetQuality}. Approve for merge?`,\n    title: 'Final Review',\n    context: { runId: ctx.runId, files: [{ path: 'artifacts/final-report.md', format: 'markdown' }] }\n  });\n\n  return {\n    success: quality >= targetQuality,\n    iterations: iteration,\n    finalQuality: quality,\n    iterationResults\n  };\n}\n```\n\n### Example 2: Quality Scoring Task Definition\n\n```javascript\nexport const agentQualityScoringTask = defineTask('quality-scorer', (args, taskCtx) => ({\n  kind: 'agent',\n  title: `Score quality (iteration ${args.iteration})`,\n  description: 'Comprehensive quality assessment with agent',\n\n  agent: {\n    name: 'quality-assessor',\n    prompt: {\n      role: 'senior quality assurance engineer and code reviewer',\n      task: 'Analyze implementation quality across multiple dimensions',\n      context: {\n        feature: args.feature,\n        tests: args.tests,\n        testResults: args.testResults,\n        implementation: args.implementation,\n        qualityChecks: args.qualityChecks,\n        iteration: args.iteration,\n        targetQuality: args.targetQuality\n      },\n      instructions: [\n        'Review test quality: coverage, edge cases, assertions (weight: 25%)',\n        'Review implementation quality: correctness, readability (weight: 30%)',\n        'Review code metrics: lint, types, complexity (weight: 20%)',\n        'Review security: vulnerabilities, input validation (weight: 15%)',\n        'Review requirements alignment (weight: 10%)',\n        'Calculate weighted overall score (0-100)',\n        'Provide prioritized recommendations for improvement'\n      ],\n      outputFormat: 'JSON with overallScore, scores by dimension, recommendations'\n    },\n    outputSchema: {\n      type: 'object',\n      required: ['overallScore', 'scores', 'recommendations'],\n      properties: {\n        overallScore: { type: 'number', minimum: 0, maximum: 100 },\n        scores: {\n          type: 'object',\n          properties: {\n            tests: { type: 'number' },\n            implementation: { type: 'number' },\n            codeQuality: { type: 'number' },\n            security: { type: 'number' },\n            alignment: { type: 'number' }\n          }\n        },\n        recommendations: { type: 'array', items: { type: 'string' } }\n      }\n    }\n  },\n\n  io: {\n    inputJsonPath: `tasks/${taskCtx.effectId}/input.json`,\n    outputJsonPath: `tasks/${taskCtx.effectId}/result.json`\n  }\n}));\n```\n\n### Best Practices\n\n1. **Set Realistic Targets**: Aim for achievable quality scores (80-90% is often reasonable)\n2. **Limit Iterations**: Prevent runaway loops with sensible limits (5-10 iterations typically)\n3. **Use Parallel Checks**: Run independent quality checks concurrently for efficiency\n4. **Provide Feedback**: Pass recommendations from scoring to subsequent iterations\n5. **Log Progress**: Track quality scores across iterations for visibility\n6. **Include Breakpoints**: Add approval gates at key milestones\n\n---\n\n## Common Pitfalls and Troubleshooting\n\n### Pitfall 1: Quality Score Not Improving\n\n**Symptom:**\n```\nIteration 1: Quality 65/100\nIteration 2: Quality 66/100\nIteration 3: Quality 65/100\nIteration 4: Quality 67/100\nIteration 5: Quality 66/100\nTarget not met: 85/100\n```\n\n**Causes:**\n- Quality target is unrealistic for the codebase\n- Scoring criteria are too strict\n- Fundamental issues blocking improvement\n\n**Solutions:**\n\n1. Review iteration feedback to identify blocking issues:\n   ```\n   What recommendations came from my quality scoring?\n   ```\n\n2. Adjust quality target:\n   ```javascript\n   const { targetQuality = 75 } = inputs;  // Lower target\n   ```\n\n3. Increase iteration limit:\n   ```javascript\n   const { maxIterations = 10 } = inputs;  // More iterations\n   ```\n\n4. Review scoring weights for balance\n\n### Pitfall 2: Too Many Iterations\n\n**Symptom:** Process runs for many iterations before converging.\n\n**Cause:** Target is too high or improvements are too granular.\n\n**Solutions:**\n\n1. Implement early exit on plateau:\n   ```javascript\n   const recentScores = iterationResults.slice(-3).map(r => r.quality);\n   if (Math.max(...recentScores) - Math.min(...recentScores) < 2) {\n     ctx.log('Quality plateaued, stopping early');\n     break;\n   }\n   ```\n\n2. Increase improvement scope per iteration\n\n3. Lower quality target to realistic level\n\n### Pitfall 3: Inconsistent Quality Scores\n\n**Symptom:** Quality scores vary significantly between iterations without clear reason.\n\n**Cause:** Non-deterministic scoring or external factors.\n\n**Solution:**\n\n1. Use deterministic scoring criteria\n2. Ensure `ctx.now()` is used instead of `Date.now()` for timestamps\n3. Review agent scoring prompts for consistency\n\n### Pitfall 4: Iteration Takes Too Long\n\n**Symptom:** Each iteration takes several minutes.\n\n**Cause:** Sequential execution of independent tasks.\n\n**Solution:** Use parallel execution:\n\n```javascript\n// Slow: Sequential\nconst coverage = await ctx.task(coverageTask, {});\nconst lint = await ctx.task(lintTask, {});\nconst security = await ctx.task(securityTask, {});\n\n// Fast: Parallel\nconst [coverage, lint, security] = await ctx.parallel.all([\n  () => ctx.task(coverageTask, {}),\n  () => ctx.task(lintTask, {}),\n  () => ctx.task(securityTask, {})\n]);\n```\n\n---\n\n## Related Documentation\n\n- [Process Definitions](./process-definitions.md) - Learn to create quality convergence processes\n- [Parallel Execution](./parallel-execution.md) - Optimize quality checks with parallelism\n- [Breakpoints](./breakpoints.md) - Add approval gates to quality convergence workflows\n- [Best Practices](./best-practices.md) - Patterns for setting targets, custom scoring strategies, and balancing speed vs thoroughness\n- [Process Library](./process-library.md) - Browse the SDK-managed library and current process counts\n- [Two-Loops Architecture](./two-loops-architecture.md) - Deep dive into the evidence-driven completion model\n\n---\n\n## Try Different Methodologies and Processes\n\nBabysitter offers two levels of reusable workflows:\n\n### Methodologies (38 directories in this repo snapshot) - The \"How\"\n\n**Quality convergence works with ANY of Babysitter's methodology families** - not just TDD. In this repository snapshot there are 38 methodology directories under `library/methodologies/`.\n\n| Methodology | Best For | Quality Focus |\n|-------------|----------|---------------|\n| **TDD Quality Convergence** | Test-first development | Test coverage, regression prevention |\n| **GSD (Get Stuff Done)** | Rapid prototyping | Working software, iteration speed |\n| **Spec-Kit** | Enterprise/governance | Specification compliance, audit trails |\n| **BDD/Specification by Example** | Team collaboration | Acceptance criteria, living documentation |\n| **Domain-Driven Design** | Complex business domains | Domain model integrity, bounded contexts |\n\n**Browse methodologies:**\n- [Methodology overview](../reference/glossary.md#methodology)\n- [Methodologies folder](../../../library/methodologies/)\n\n### Domain Processes - The \"What\"\n\nBeyond methodologies, Babysitter includes the following generated specialization snapshot from the live repository tree:\n\n<!-- quality-convergence:domains:start -->\n| Domain | Processes | Examples |\n|--------|-----------|----------|\n| **Development and technical specializations** | 837 | Web APIs, mobile apps, DevOps pipelines, AI, security, and related technical workflows |\n| **Business domains** | 490 | Legal contracts, HR workflows, marketing campaigns, finance, logistics, and related domains |\n| **Science & engineering domains** | 551 | Quantum algorithms, aerospace systems, biomedical devices, mathematics, and related domains |\n| **Social sciences & humanities** | 160 | Education, healthcare, arts, philosophy, and social-science research |\n<!-- quality-convergence:domains:end -->\n\n**Browse processes:**\n- [Process Library](./process-library.md) - Full catalog with descriptions\n- [Specializations folder](../../../library/specializations/)\n\n---\n\n## What To Do Next\n\n| Your Goal | Next Step |\n|-----------|-----------|\n| Run a quality convergence workflow | Try `/babysitter:call build a feature with 85% quality target` |\n| Build your own convergence loop | Copy the TDD example above and customize the scoring |\n| Add more quality gates | See the Five Quality Gate Categories section |\n| Debug a stuck convergence | Check [Best Practices - Debugging](./best-practices.md#debugging-and-troubleshooting) |\n| Understand the architecture | Read [Two-Loops Architecture](./two-loops-architecture.md) |\n\n---\n\n## Summary\n\nQuality convergence enables automated iterative improvement until defined quality targets are met. Combine quality scoring, feedback loops, and sensible iteration limits to ensure consistent, high-quality outputs. Use parallel execution for efficiency and breakpoints for human oversight at critical milestones.\n\n**Key Takeaways:**\n\n1. **Set realistic targets** - Start with 80-85, work up to 90+\n2. **Use multiple gate types** - Tests + lint + security + performance\n3. **Pass feedback between iterations** - AI learns from each failure\n4. **Detect plateaus early** - Don't waste iterations on no improvement\n5. **Parallelize independent checks** - Faster iterations mean faster convergence\n",
    "documents": []
  },
  "outgoingEdges": [],
  "incomingEdges": [
    {
      "from": "page:docs-user-guide-features",
      "to": "page:docs-user-guide-features-quality-convergence",
      "kind": "contains_page"
    }
  ]
}

Shortcuts

Back to overview
Open graph tab