II.
Page JSON
Structured · livepage:docs-user-guide-features-quality-convergence
Quality Convergence: Iterative Improvement Until Targets Met json
Inspect the normalized record payload exactly as the atlas UI reads it.
{
"id": "page:docs-user-guide-features-quality-convergence",
"_kind": "Page",
"_file": "wiki/docs/user-guide/features/quality-convergence.md",
"_cluster": "wiki",
"attributes": {
"nodeKind": "Page",
"sourcePath": "docs/user-guide/features/quality-convergence.md",
"sourceKind": "repo-docs",
"title": "Quality Convergence: Iterative Improvement Until Targets Met",
"displayName": "Quality Convergence: Iterative Improvement Until Targets Met",
"slug": "docs/user-guide/features/quality-convergence",
"articlePath": "wiki/docs/user-guide/features/quality-convergence.md",
"article": "\n# Quality Convergence: Iterative Improvement Until Targets Met\n\n**Version:** 2.1\n**Last Updated:** 2026-01-26\n**Category:** Feature Guide\n\n---\n\n## Quick Summary (Read This First)\n\n**Quality Convergence = \"Keep trying until it's good enough\"**\n\nInstead of:\n```\nAI writes code → Tests fail → You manually fix → Tests fail again → Repeat 10x\n```\n\nBabysitter does:\n```\nAI writes code → Tests: 60% pass → AI fixes → Tests: 85% pass → AI fixes → Tests: 95% pass ✓ Done!\n```\n\n### What You'll Learn in This Document\n\n| Section | What It Covers | Read If You Want To... |\n|---------|----------------|------------------------|\n| Five Quality Gates | Types of checks (tests, lint, security, etc.) | Understand what gets checked |\n| 90-Score Pattern | How to reliably hit high quality | Build production-ready workflows |\n| Process Examples | Real code from the library | See working implementations |\n| Step-by-Step | How to build your own | Create custom quality loops |\n\n### A Simple Example\n\nHere's what quality convergence looks like in practice:\n\n```\nIteration 1:\n - AI writes login feature\n - Tests run: 3/10 passing (30%)\n - AI sees: \"Missing password validation, no error handling\"\n\nIteration 2:\n - AI fixes based on feedback\n - Tests run: 7/10 passing (70%)\n - AI sees: \"Edge case for empty email not handled\"\n\nIteration 3:\n - AI fixes edge cases\n - Tests run: 10/10 passing (100%)\n - Quality target met! ✓\n\nOutput: Working login feature with all tests passing\n```\n\n**Key insight**: The AI doesn't just try once - it learns from each failure and improves.\n\n### Understanding Quality Scores\n\n**Quality scores are multi-dimensional, not a single number.** This is what makes Babysitter's quality convergence so accurate - instead of a simple pass/fail, you get nuanced feedback across multiple dimensions that guide improvement.\n\nA typical quality score includes:\n\n| Dimension | What It Measures | Example |\n|-----------|------------------|---------|\n| **Tests** | Pass rate and coverage | 92% tests passing, 85% coverage |\n| **Code Quality** | Lint errors, complexity | 0 lint errors, complexity < 10 |\n| **Security** | Vulnerabilities, secrets | 0 critical issues |\n| **Performance** | Response time, bundle size | p95 < 500ms |\n| **Type Safety** | Type errors, null safety | 0 type errors |\n\n### The Power of Custom Dimensions\n\n**You define what quality means for your project.** The dimensions above are just examples - you can:\n\n1. **Define your own 5 dimensions** that matter most for your domain\n2. **Ask Babysitter to suggest dimensions** appropriate for your specific task\n3. **Weight dimensions differently** based on project phase or criticality\n\nFor example, a data pipeline might use completely different dimensions:\n\n| Dimension | Weight | Threshold |\n|-----------|--------|-----------|\n| **Data Accuracy** | 30% | > 99.9% |\n| **Processing Speed** | 25% | < 5 min/GB |\n| **Schema Validation** | 20% | 100% valid |\n| **Idempotency** | 15% | All operations idempotent |\n| **Error Recovery** | 10% | Auto-recovery < 30s |\n\nThis flexibility means quality convergence adapts to any domain - from ML model training to infrastructure deployment to documentation generation.\n\n**For detailed scoring formulas and weight configurations, see [Best Practices - Custom Scoring Strategies](./best-practices.md#custom-scoring-strategies).**\n\n---\n\n## Overview\n\nQuality convergence is an iterative improvement pattern where Babysitter repeatedly refines work until a defined quality target is achieved. Instead of executing a task once and hoping for the best, quality convergence loops through implementation, testing, and scoring cycles until the output meets your standards.\n\n### The Core Principle: Evidence-Driven Completion\n\nFrom the [Two-Loops Control Plane architecture](./two-loops-architecture.md), the fundamental principle is:\n\n> **If you don't have evidence, you don't have completion.**\n\n*If you do only one thing: make completion require evidence.* — This single principle transforms \"it seems done\" into \"it is done.\"\n\nEvery phase must end with:\n- **Artifact**: The work product (patch, doc, config, report)\n- **Evidence**: Proof that it meets requirements (logs, test output, checks)\n\n### Why Use Quality Convergence\n\n- **Consistent Quality**: Guarantee outputs meet minimum quality thresholds\n- **Automated Refinement**: Let the system iterate without manual intervention\n- **Measurable Results**: Track quality scores across iterations\n- **Predictable Outcomes**: Set clear targets and iteration limits\n- **TDD Integration**: Combine with test-driven development for robust code\n- **Evidence-Based Completion**: Every iteration produces verifiable proof of quality\n\n---\n\n## The Five Quality Gate Categories\n\nQuality gates are not a single check. They form a **layered validation system** that ensures completeness from multiple perspectives. For robust quality convergence, use **4-5 gate types simultaneously**.\n\n### Gate Type 1: Functional Tests (Unit/Integration/System/Acceptance)\n\nVerifies the code behaves correctly across all levels.\n\n```javascript\n// From: methodologies/v-model.js (V-Model process)\nconst testResults = await ctx.task(executeTestsTask, {\n implementation,\n unitTestDesigns, // Validates module design\n integrationTestDesign, // Validates architecture\n systemTestDesign, // Validates system design\n acceptanceTestDesign // Validates requirements\n});\n\nconst allTestsPassed =\n testResults.unitTests.passed &&\n testResults.integrationTests.passed &&\n testResults.systemTests.passed &&\n testResults.acceptanceTests.passed;\n```\n\n**Gate Criteria:**\n\n| Test Level | What It Validates | Typical Pass Threshold |\n|------------|-------------------|------------------------|\n| Unit Tests | Individual functions/classes | 90-100% pass rate |\n| Integration Tests | Module interactions | 95-100% pass rate |\n| System Tests | End-to-end behavior | 90-100% pass rate |\n| Acceptance Tests | User requirements | 100% for critical |\n\n### Gate Type 2: Code Quality (Lint/Format/Complexity)\n\nEnsures code follows style guidelines and maintainability standards.\n\n```javascript\n// Parallel code quality checks\nconst [lint, format, complexity] = await ctx.parallel.all([\n () => ctx.task(lintTask, { files: impl.filesModified }),\n () => ctx.task(formatCheckTask, { files: impl.filesModified }),\n () => ctx.task(complexityTask, { files: impl.filesModified })\n]);\n\nconst codeQualityGatePassed =\n lint.errorCount === 0 &&\n format.violations === 0 &&\n complexity.maxCyclomaticComplexity < 10;\n```\n\n**Gate Criteria:**\n\n| Check | Tool Examples | Typical Threshold |\n|-------|---------------|-------------------|\n| Lint Errors | ESLint, Pylint | 0 errors |\n| Formatting | Prettier, Black | 0 violations |\n| Cyclomatic Complexity | SonarQube, Radon | < 10 per function |\n| Code Duplication | jscpd, CPD | < 3% duplication |\n\n### Gate Type 3: Type Safety and Static Analysis\n\nCatches bugs at compile/analysis time without running the code.\n\n```javascript\n// From: gsd/iterative-convergence enhanced pattern\nconst [typeCheck, staticAnalysis] = await ctx.parallel.all([\n () => ctx.task(typeCheckTask, { files: impl.filesModified }),\n () => ctx.task(staticAnalysisTask, { files: impl.filesModified })\n]);\n\nconst staticGatePassed =\n typeCheck.errors.length === 0 &&\n staticAnalysis.criticalIssues === 0 &&\n staticAnalysis.highIssues === 0;\n```\n\n**Gate Criteria:**\n\n| Check | What It Catches | Typical Threshold |\n|-------|-----------------|-------------------|\n| Type Checking | Type mismatches, null errors | 0 type errors |\n| Static Analysis | Potential bugs, code smells | 0 critical/high issues |\n| Dead Code | Unreachable statements | 0 dead code blocks |\n| Null Safety | Potential null dereferences | 0 null warnings |\n\n### Gate Type 4: Security Scanning\n\nIdentifies vulnerabilities, secrets, and security anti-patterns.\n\n```javascript\n// Security gate from methodologies/spec-driven-development.js\nconst security = await ctx.task(securityTask, {\n files: impl.filesModified,\n scanLevel: inputs.safetyLevel // 'standard' | 'high' | 'critical'\n});\n\nconst securityGatePassed =\n security.criticalVulnerabilities === 0 &&\n security.highVulnerabilities === 0 &&\n security.secretsDetected === 0 &&\n security.dependencyVulnerabilities.critical === 0;\n```\n\n**Gate Criteria:**\n\n| Check | What It Scans | Typical Threshold |\n|-------|---------------|-------------------|\n| SAST (Static) | SQL injection, XSS, etc. | 0 critical/high |\n| Secrets Detection | API keys, passwords | 0 secrets |\n| Dependency Scan | Known CVEs in packages | 0 critical CVEs |\n| OWASP Top 10 | Common web vulnerabilities | 0 violations |\n\n### Gate Type 5: Performance and Resource Thresholds\n\nEnsures the implementation meets non-functional requirements.\n\n```javascript\n// Performance gate for production readiness\nconst performance = await ctx.task(performanceCheckTask, {\n implementation: impl,\n thresholds: {\n loadTimeMs: 1500, // First Contentful Paint\n bundleSizeKb: 200, // Gzipped bundle\n apiResponseP95Ms: 500, // 95th percentile\n memoryUsageMb: 512 // Peak memory\n }\n});\n\nconst performanceGatePassed =\n performance.fcp <= 1500 &&\n performance.bundleSize <= 200 &&\n performance.apiP95 <= 500 &&\n performance.peakMemory <= 512;\n```\n\n**Gate Criteria:**\n\n| Metric | Typical Target | Domain |\n|--------|----------------|--------|\n| FCP (First Contentful Paint) | < 1.5s | Frontend |\n| Bundle Size | < 200KB gzipped | Frontend |\n| API p95 Response | < 500ms | Backend |\n| Memory Usage | < 512MB | Server |\n| CPU Utilization | < 70% average | Server |\n\n---\n\n## The 90-Score Quality Convergence Pattern\n\nTo reliably achieve scores of **90+**, implement a **multi-gate weighted scoring system** with iterative feedback.\n\n### Step 1: Define Weighted Scoring Dimensions\n\n```javascript\n// Recommended weights for high-quality convergence\nconst QUALITY_WEIGHTS = {\n // For production features\n production: {\n tests: 0.25, // Test coverage and pass rate\n implementation: 0.25, // Code correctness\n codeQuality: 0.15, // Lint, complexity, formatting\n security: 0.20, // Vulnerability scanning\n performance: 0.15 // Non-functional requirements\n },\n\n // For security-critical systems\n securityCritical: {\n tests: 0.20,\n implementation: 0.20,\n codeQuality: 0.10,\n security: 0.35, // Higher weight for security\n performance: 0.15\n },\n\n // For performance-critical systems\n performanceCritical: {\n tests: 0.20,\n implementation: 0.20,\n codeQuality: 0.10,\n security: 0.15,\n performance: 0.35 // Higher weight for performance\n }\n};\n```\n\n### Step 2: Implement the Multi-Gate Convergence Loop\n\n```javascript\n/**\n * Multi-gate quality convergence targeting 90+ scores\n * References: gsd/iterative-convergence.js, methodologies/spec-driven-development.js\n */\nexport async function process(inputs, ctx) {\n const {\n feature,\n targetQuality = 90, // Target score\n maxIterations = 10, // Allow more iterations for high targets\n minImprovement = 2, // Minimum improvement per iteration\n plateauThreshold = 3, // Iterations without improvement\n weights = QUALITY_WEIGHTS.production\n } = inputs;\n\n let iteration = 0;\n let quality = 0;\n const iterationHistory = [];\n\n while (iteration < maxIterations && quality < targetQuality) {\n iteration++;\n ctx.log(`[Iteration ${iteration}/${maxIterations}] Target: ${targetQuality}`);\n\n // ===== ACT: Implement with feedback from previous iteration =====\n const previousFeedback = iteration > 1\n ? iterationHistory[iteration - 2].recommendations\n : null;\n\n const impl = await ctx.task(implementTask, {\n feature,\n iteration,\n previousFeedback,\n focusAreas: previousFeedback?.slice(0, 3) // Top 3 priorities\n });\n\n // ===== VALIDATE: Run all five quality gates in parallel =====\n const [tests, codeQuality, staticAnalysis, security, performance] =\n await ctx.parallel.all([\n () => ctx.task(testGateTask, { impl }),\n () => ctx.task(codeQualityGateTask, { impl }),\n () => ctx.task(staticAnalysisGateTask, { impl }),\n () => ctx.task(securityGateTask, { impl }),\n () => ctx.task(performanceGateTask, { impl })\n ]);\n\n // ===== SCORE: Calculate weighted quality score =====\n const scores = {\n tests: tests.score,\n implementation: calculateImplementationScore(impl, tests),\n codeQuality: codeQuality.score,\n security: security.score,\n performance: performance.score\n };\n\n quality = Object.entries(weights).reduce(\n (total, [dimension, weight]) => total + (scores[dimension] * weight),\n 0\n );\n\n // ===== ANALYZE: Generate prioritized recommendations =====\n const recommendations = generateRecommendations(scores, weights, targetQuality);\n\n iterationHistory.push({\n iteration,\n quality,\n scores,\n recommendations,\n gates: { tests, codeQuality, staticAnalysis, security, performance }\n });\n\n ctx.log(`Quality: ${quality.toFixed(1)}/${targetQuality} | ` +\n `Tests: ${scores.tests} | Code: ${scores.codeQuality} | ` +\n `Security: ${scores.security} | Perf: ${scores.performance}`);\n\n // ===== EARLY EXIT: Detect plateau =====\n if (iteration >= plateauThreshold) {\n const recent = iterationHistory.slice(-plateauThreshold).map(r => r.quality);\n const improvement = Math.max(...recent) - Math.min(...recent);\n if (improvement < minImprovement) {\n ctx.log(`Quality plateaued at ${quality.toFixed(1)}, stopping early`);\n break;\n }\n }\n\n // ===== BREAKPOINT: At key thresholds =====\n const converged = quality >= targetQuality;\n if (!converged && quality >= 80 && iteration > 1) {\n await ctx.breakpoint({\n question: `Quality at ${quality.toFixed(1)}. Continue toward ${targetQuality}?`,\n title: `Iteration ${iteration} Checkpoint`,\n context: {\n runId: ctx.runId,\n files: [{ path: `artifacts/iteration-${iteration}-report.md`, format: 'markdown' }]\n }\n });\n }\n }\n\n // ===== FINAL VALIDATION =====\n const converged = quality >= targetQuality;\n\n return {\n success: converged,\n quality,\n targetQuality,\n iterations: iteration,\n iterationHistory,\n finalGates: iterationHistory[iterationHistory.length - 1].gates,\n metadata: { processId: 'quality-convergence-90', timestamp: ctx.now() }\n };\n}\n\nfunction generateRecommendations(scores, weights, target) {\n // Calculate gap for each dimension\n const gaps = Object.entries(scores).map(([dim, score]) => ({\n dimension: dim,\n score,\n weight: weights[dim],\n weightedGap: (100 - score) * weights[dim],\n priority: (100 - score) * weights[dim] // Higher weighted gap = higher priority\n }));\n\n // Sort by priority (highest impact improvements first)\n return gaps\n .sort((a, b) => b.priority - a.priority)\n .map(g => `Improve ${g.dimension}: currently ${g.score}, ` +\n `contributes ${(g.weight * g.score).toFixed(1)} of ${(g.weight * 100).toFixed(1)} possible`);\n}\n```\n\n### Step 3: Progressive Target Strategy\n\nFor challenging targets (90+), use progressive escalation:\n\n```javascript\n// Progressive targets that increase as iterations proceed\nconst progressiveTargets = [\n { iteration: 1, target: 70 }, // First: basic functionality\n { iteration: 3, target: 80 }, // Mid: solid implementation\n { iteration: 5, target: 85 }, // Late: polish and edge cases\n { iteration: 7, target: 90 } // Final: production ready\n];\n\nfunction getCurrentTarget(iteration, finalTarget) {\n const applicable = progressiveTargets.filter(t => t.iteration <= iteration);\n const progressiveTarget = applicable[applicable.length - 1]?.target || 70;\n return Math.min(progressiveTarget, finalTarget);\n}\n```\n\n---\n\n## Real-World Process Examples\n\n### Example 1: V-Model with Four Test Levels\n\nThe V-Model process (`methodologies/v-model.js`) implements comprehensive quality gates:\n\n```\n/babysitter:call use the V-Model methodology to build a user authentication system with high safety level\n```\n\nOr with more detail:\n```\n/babysitter:call implement user authentication using V-Model with traceability and thorough testing\n```\n\n**Quality Gates in V-Model:**\n1. Requirements → Acceptance Tests (validates user needs)\n2. System Design → System Tests (validates architecture)\n3. Module Design → Integration Tests (validates interfaces)\n4. Implementation → Unit Tests (validates code)\n5. Traceability Matrix (validates coverage)\n\n### Example 2: Spec-Kit with Constitution Validation\n\nThe Spec-Kit process (`methodologies/spec-driven-development.js`) adds governance gates:\n\n```\n/babysitter:call use spec-driven development to build PCI-compliant payment processing\n```\n\nOr:\n```\n/babysitter:call build a payment flow using the spec-driven methodology with governance validation\n```\n\n**Quality Gates in Spec-Kit:**\n1. Constitution Validation (governance principles)\n2. Specification Review (requirements completeness)\n3. Plan-Constitution Alignment (architecture compliance)\n4. Task Consistency Analysis (cross-artifact validation)\n5. Implementation Checklists (\"unit tests for English\")\n6. User Story Validation (final acceptance)\n\n### Example 3: GSD Iterative Convergence\n\nThe GSD process (`gsd/iterative-convergence.js`) implements feedback-driven convergence:\n\n```\n/babysitter:call build a shopping cart checkout flow with 90% quality target\n```\n\nOr:\n```\n/babysitter:call implement checkout flow using iterative convergence with max 8 iterations\n```\n\n**Quality Gates in GSD:**\n1. Implementation scoring\n2. Test execution\n3. Quality assessment with recommendations\n4. Iterative feedback loop\n\n---\n\n## Use Cases and Scenarios\n\n### Scenario 1: TDD Feature Development\n\nBuild a feature with test-driven development, iterating until test coverage and quality targets are met.\n\n```javascript\nexport async function process(inputs, ctx) {\n const { feature, targetQuality = 85, maxIterations = 5 } = inputs;\n\n let iteration = 0;\n let quality = 0;\n\n while (iteration < maxIterations && quality < targetQuality) {\n iteration++;\n ctx.log(`[Iteration ${iteration}/${maxIterations}] Starting TDD implementation...`);\n\n // Write tests first\n const tests = await ctx.task(writeTestsTask, { feature, iteration });\n\n // Implement code to pass tests\n const impl = await ctx.task(implementTask, { tests, feature });\n\n // Run quality checks\n const [coverage, lint, security] = await ctx.parallel.all([\n () => ctx.task(coverageTask, {}),\n () => ctx.task(lintTask, {}),\n () => ctx.task(securityTask, {})\n ]);\n\n // Agent scores quality\n const score = await ctx.task(agentScoringTask, {\n tests, impl, coverage, lint, security\n });\n\n quality = score.overall;\n ctx.log(`Quality score: ${quality}/${targetQuality}`);\n }\n\n return { converged: quality >= targetQuality, iterations: iteration, quality };\n}\n```\n\n### Scenario 2: Code Quality Improvement\n\nIteratively improve existing code until it meets quality standards.\n\n```javascript\nexport async function process(inputs, ctx) {\n const { files, targetScore = 90, maxIterations = 10 } = inputs;\n\n let iteration = 0;\n let currentScore = 0;\n\n // Initial assessment\n currentScore = await ctx.task(assessQualityTask, { files });\n ctx.log(`Initial quality score: ${currentScore}`);\n\n while (iteration < maxIterations && currentScore < targetScore) {\n iteration++;\n\n // Identify improvements\n const improvements = await ctx.task(identifyImprovementsTask, {\n files,\n currentScore,\n targetScore\n });\n\n // Apply improvements\n await ctx.task(applyImprovementsTask, { improvements });\n\n // Re-assess\n currentScore = await ctx.task(assessQualityTask, { files });\n ctx.log(`Iteration ${iteration}: Quality score ${currentScore}/${targetScore}`);\n }\n\n return { achieved: currentScore >= targetScore, finalScore: currentScore };\n}\n```\n\n### Scenario 3: Documentation Generation\n\nGenerate documentation and refine until it meets completeness standards.\n\n```javascript\nexport async function process(inputs, ctx) {\n const { codebase, targetCompleteness = 80, maxIterations = 3 } = inputs;\n\n let iteration = 0;\n let completeness = 0;\n\n while (iteration < maxIterations && completeness < targetCompleteness) {\n iteration++;\n\n // Generate or improve documentation\n await ctx.task(generateDocsTask, { codebase, iteration });\n\n // Assess completeness\n const assessment = await ctx.task(assessDocsCompletenessTask, { codebase });\n completeness = assessment.completenessScore;\n\n ctx.log(`Documentation completeness: ${completeness}%`);\n }\n\n return { complete: completeness >= targetCompleteness, completeness };\n}\n```\n\n---\n\n## Step-by-Step Instructions\n\n### Step 1: Define Quality Targets\n\nDetermine what quality means for your use case.\n\n**Common quality metrics:**\n- Test coverage percentage (e.g., 85%)\n- Lint error count (e.g., 0 errors)\n- Security vulnerability count (e.g., 0 critical)\n- Overall quality score (e.g., 90/100)\n\n### Step 2: Set Iteration Limits\n\nPrevent infinite loops by setting a maximum number of iterations.\n\n```javascript\nconst { targetQuality = 85, maxIterations = 5 } = inputs;\n```\n\n**Recommendations:**\n- Simple improvements: 3-5 iterations\n- Complex refactoring: 5-10 iterations\n- Large features: 10-15 iterations\n\n### Step 3: Implement the Convergence Loop\n\nCreate a loop that continues until the target is met or iterations are exhausted.\n\n```javascript\nlet iteration = 0;\nlet quality = 0;\n\nwhile (iteration < maxIterations && quality < targetQuality) {\n iteration++;\n\n // Perform work\n // ...\n\n // Measure quality\n quality = await measureQuality();\n\n ctx.log(`Iteration ${iteration}: ${quality}/${targetQuality}`);\n}\n```\n\n### Step 4: Implement Quality Scoring\n\nCreate a task that evaluates quality based on your criteria.\n\n```javascript\nexport const agentQualityScoringTask = defineTask('quality-scorer', (args, taskCtx) => ({\n kind: 'agent',\n title: 'Score implementation quality',\n agent: {\n name: 'quality-assessor',\n prompt: {\n role: 'senior quality assurance engineer',\n task: 'Analyze implementation quality and provide a score from 0-100',\n context: {\n tests: args.tests,\n implementation: args.implementation,\n coverage: args.coverage,\n lint: args.lint,\n security: args.security\n },\n instructions: [\n 'Review test quality (weight: 25%)',\n 'Review implementation quality (weight: 30%)',\n 'Review code metrics (weight: 20%)',\n 'Review security (weight: 15%)',\n 'Review alignment with requirements (weight: 10%)',\n 'Provide recommendations for improvement'\n ]\n }\n }\n}));\n```\n\n### Step 5: Add Feedback to Subsequent Iterations\n\nPass quality feedback to the next iteration to guide improvements.\n\n```javascript\nconst iterationResults = [];\n\nwhile (iteration < maxIterations && quality < targetQuality) {\n iteration++;\n\n const previousFeedback = iteration > 1\n ? iterationResults[iteration - 2].recommendations\n : null;\n\n const impl = await ctx.task(implementTask, {\n feature,\n previousFeedback // Guide improvements based on previous scoring\n });\n\n const score = await ctx.task(agentScoringTask, { impl });\n\n iterationResults.push({\n iteration,\n quality: score.overall,\n recommendations: score.recommendations\n });\n\n quality = score.overall;\n}\n```\n\n---\n\n## Configuration Options\n\n### Quality Target Configuration\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `targetQuality` | number | 85 | Target quality score (0-100) |\n| `maxIterations` | number | 5 | Maximum number of iterations before stopping |\n\n### Scoring Weights Configuration\n\nCustomize how different aspects contribute to the overall score.\n\n```javascript\nconst scoringWeights = {\n tests: 0.25, // 25% weight for test quality\n implementation: 0.30, // 30% weight for implementation quality\n codeQuality: 0.20, // 20% weight for code metrics\n security: 0.15, // 15% weight for security\n alignment: 0.10 // 10% weight for requirements alignment\n};\n```\n\n### Early Exit Conditions\n\nConfigure conditions that stop iteration early.\n\n```javascript\n// Stop if quality plateaus (no improvement in last N iterations)\nif (qualityHistory.length >= 3) {\n const lastThree = qualityHistory.slice(-3);\n const improvement = lastThree[2] - lastThree[0];\n if (improvement < 1) {\n ctx.log('Quality plateaued, stopping early');\n break;\n }\n}\n```\n\n---\n\n## Code Examples and Best Practices\n\n### Example 1: Full TDD Quality Convergence Process\n\nComplete process definition demonstrating all quality convergence patterns.\n\n```javascript\nexport async function process(inputs, ctx) {\n const {\n feature = 'User authentication',\n targetQuality = 85,\n maxIterations = 5\n } = inputs;\n\n // Phase 1: Planning\n const plan = await ctx.task(agentPlanningTask, { feature });\n\n await ctx.breakpoint({\n question: `Review the plan for \"${feature}\". Approve to proceed?`,\n title: 'Plan Review',\n context: { runId: ctx.runId, files: [{ path: 'artifacts/plan.md', format: 'markdown' }] }\n });\n\n // Phase 2: Quality Convergence Loop\n let iteration = 0;\n let quality = 0;\n const iterationResults = [];\n\n while (iteration < maxIterations && quality < targetQuality) {\n iteration++;\n ctx.log(`[Iteration ${iteration}/${maxIterations}]`);\n\n // TDD: Write tests first\n const tests = await ctx.task(writeTestsTask, {\n feature,\n plan,\n iteration,\n previousFeedback: iteration > 1 ? iterationResults[iteration - 2].feedback : null\n });\n\n // Run tests (expect failures on first iteration)\n await ctx.task(runTestsTask, { testFiles: tests.testFiles, expectFailures: iteration === 1 });\n\n // Implement to pass tests\n const impl = await ctx.task(implementTask, {\n feature,\n tests,\n iteration,\n previousFeedback: iteration > 1 ? iterationResults[iteration - 2].feedback : null\n });\n\n // Run tests again\n const testResults = await ctx.task(runTestsTask, { testFiles: tests.testFiles });\n\n // Parallel quality checks\n const [coverage, lint, typeCheck, security] = await ctx.parallel.all([\n () => ctx.task(coverageTask, {}),\n () => ctx.task(lintTask, { files: impl.filesModified }),\n () => ctx.task(typeCheckTask, { files: impl.filesModified }),\n () => ctx.task(securityTask, { files: impl.filesModified })\n ]);\n\n // Agent quality scoring\n const score = await ctx.task(agentQualityScoringTask, {\n tests,\n testResults,\n implementation: impl,\n qualityChecks: { coverage, lint, typeCheck, security },\n iteration,\n targetQuality\n });\n\n quality = score.overallScore;\n iterationResults.push({\n iteration,\n quality,\n feedback: score.recommendations\n });\n\n ctx.log(`Quality: ${quality}/${targetQuality}`);\n\n if (quality >= targetQuality) {\n ctx.log('Target quality achieved!');\n }\n }\n\n // Final approval\n await ctx.breakpoint({\n question: `Quality: ${quality}/${targetQuality}. Approve for merge?`,\n title: 'Final Review',\n context: { runId: ctx.runId, files: [{ path: 'artifacts/final-report.md', format: 'markdown' }] }\n });\n\n return {\n success: quality >= targetQuality,\n iterations: iteration,\n finalQuality: quality,\n iterationResults\n };\n}\n```\n\n### Example 2: Quality Scoring Task Definition\n\n```javascript\nexport const agentQualityScoringTask = defineTask('quality-scorer', (args, taskCtx) => ({\n kind: 'agent',\n title: `Score quality (iteration ${args.iteration})`,\n description: 'Comprehensive quality assessment with agent',\n\n agent: {\n name: 'quality-assessor',\n prompt: {\n role: 'senior quality assurance engineer and code reviewer',\n task: 'Analyze implementation quality across multiple dimensions',\n context: {\n feature: args.feature,\n tests: args.tests,\n testResults: args.testResults,\n implementation: args.implementation,\n qualityChecks: args.qualityChecks,\n iteration: args.iteration,\n targetQuality: args.targetQuality\n },\n instructions: [\n 'Review test quality: coverage, edge cases, assertions (weight: 25%)',\n 'Review implementation quality: correctness, readability (weight: 30%)',\n 'Review code metrics: lint, types, complexity (weight: 20%)',\n 'Review security: vulnerabilities, input validation (weight: 15%)',\n 'Review requirements alignment (weight: 10%)',\n 'Calculate weighted overall score (0-100)',\n 'Provide prioritized recommendations for improvement'\n ],\n outputFormat: 'JSON with overallScore, scores by dimension, recommendations'\n },\n outputSchema: {\n type: 'object',\n required: ['overallScore', 'scores', 'recommendations'],\n properties: {\n overallScore: { type: 'number', minimum: 0, maximum: 100 },\n scores: {\n type: 'object',\n properties: {\n tests: { type: 'number' },\n implementation: { type: 'number' },\n codeQuality: { type: 'number' },\n security: { type: 'number' },\n alignment: { type: 'number' }\n }\n },\n recommendations: { type: 'array', items: { type: 'string' } }\n }\n }\n },\n\n io: {\n inputJsonPath: `tasks/${taskCtx.effectId}/input.json`,\n outputJsonPath: `tasks/${taskCtx.effectId}/result.json`\n }\n}));\n```\n\n### Best Practices\n\n1. **Set Realistic Targets**: Aim for achievable quality scores (80-90% is often reasonable)\n2. **Limit Iterations**: Prevent runaway loops with sensible limits (5-10 iterations typically)\n3. **Use Parallel Checks**: Run independent quality checks concurrently for efficiency\n4. **Provide Feedback**: Pass recommendations from scoring to subsequent iterations\n5. **Log Progress**: Track quality scores across iterations for visibility\n6. **Include Breakpoints**: Add approval gates at key milestones\n\n---\n\n## Common Pitfalls and Troubleshooting\n\n### Pitfall 1: Quality Score Not Improving\n\n**Symptom:**\n```\nIteration 1: Quality 65/100\nIteration 2: Quality 66/100\nIteration 3: Quality 65/100\nIteration 4: Quality 67/100\nIteration 5: Quality 66/100\nTarget not met: 85/100\n```\n\n**Causes:**\n- Quality target is unrealistic for the codebase\n- Scoring criteria are too strict\n- Fundamental issues blocking improvement\n\n**Solutions:**\n\n1. Review iteration feedback to identify blocking issues:\n ```\n What recommendations came from my quality scoring?\n ```\n\n2. Adjust quality target:\n ```javascript\n const { targetQuality = 75 } = inputs; // Lower target\n ```\n\n3. Increase iteration limit:\n ```javascript\n const { maxIterations = 10 } = inputs; // More iterations\n ```\n\n4. Review scoring weights for balance\n\n### Pitfall 2: Too Many Iterations\n\n**Symptom:** Process runs for many iterations before converging.\n\n**Cause:** Target is too high or improvements are too granular.\n\n**Solutions:**\n\n1. Implement early exit on plateau:\n ```javascript\n const recentScores = iterationResults.slice(-3).map(r => r.quality);\n if (Math.max(...recentScores) - Math.min(...recentScores) < 2) {\n ctx.log('Quality plateaued, stopping early');\n break;\n }\n ```\n\n2. Increase improvement scope per iteration\n\n3. Lower quality target to realistic level\n\n### Pitfall 3: Inconsistent Quality Scores\n\n**Symptom:** Quality scores vary significantly between iterations without clear reason.\n\n**Cause:** Non-deterministic scoring or external factors.\n\n**Solution:**\n\n1. Use deterministic scoring criteria\n2. Ensure `ctx.now()` is used instead of `Date.now()` for timestamps\n3. Review agent scoring prompts for consistency\n\n### Pitfall 4: Iteration Takes Too Long\n\n**Symptom:** Each iteration takes several minutes.\n\n**Cause:** Sequential execution of independent tasks.\n\n**Solution:** Use parallel execution:\n\n```javascript\n// Slow: Sequential\nconst coverage = await ctx.task(coverageTask, {});\nconst lint = await ctx.task(lintTask, {});\nconst security = await ctx.task(securityTask, {});\n\n// Fast: Parallel\nconst [coverage, lint, security] = await ctx.parallel.all([\n () => ctx.task(coverageTask, {}),\n () => ctx.task(lintTask, {}),\n () => ctx.task(securityTask, {})\n]);\n```\n\n---\n\n## Related Documentation\n\n- [Process Definitions](./process-definitions.md) - Learn to create quality convergence processes\n- [Parallel Execution](./parallel-execution.md) - Optimize quality checks with parallelism\n- [Breakpoints](./breakpoints.md) - Add approval gates to quality convergence workflows\n- [Best Practices](./best-practices.md) - Patterns for setting targets, custom scoring strategies, and balancing speed vs thoroughness\n- [Process Library](./process-library.md) - Browse the SDK-managed library and current process counts\n- [Two-Loops Architecture](./two-loops-architecture.md) - Deep dive into the evidence-driven completion model\n\n---\n\n## Try Different Methodologies and Processes\n\nBabysitter offers two levels of reusable workflows:\n\n### Methodologies (38 directories in this repo snapshot) - The \"How\"\n\n**Quality convergence works with ANY of Babysitter's methodology families** - not just TDD. In this repository snapshot there are 38 methodology directories under `library/methodologies/`.\n\n| Methodology | Best For | Quality Focus |\n|-------------|----------|---------------|\n| **TDD Quality Convergence** | Test-first development | Test coverage, regression prevention |\n| **GSD (Get Stuff Done)** | Rapid prototyping | Working software, iteration speed |\n| **Spec-Kit** | Enterprise/governance | Specification compliance, audit trails |\n| **BDD/Specification by Example** | Team collaboration | Acceptance criteria, living documentation |\n| **Domain-Driven Design** | Complex business domains | Domain model integrity, bounded contexts |\n\n**Browse methodologies:**\n- [Methodology overview](../reference/glossary.md#methodology)\n- [Methodologies folder](../../../library/methodologies/)\n\n### Domain Processes - The \"What\"\n\nBeyond methodologies, Babysitter includes the following generated specialization snapshot from the live repository tree:\n\n<!-- quality-convergence:domains:start -->\n| Domain | Processes | Examples |\n|--------|-----------|----------|\n| **Development and technical specializations** | 837 | Web APIs, mobile apps, DevOps pipelines, AI, security, and related technical workflows |\n| **Business domains** | 490 | Legal contracts, HR workflows, marketing campaigns, finance, logistics, and related domains |\n| **Science & engineering domains** | 551 | Quantum algorithms, aerospace systems, biomedical devices, mathematics, and related domains |\n| **Social sciences & humanities** | 160 | Education, healthcare, arts, philosophy, and social-science research |\n<!-- quality-convergence:domains:end -->\n\n**Browse processes:**\n- [Process Library](./process-library.md) - Full catalog with descriptions\n- [Specializations folder](../../../library/specializations/)\n\n---\n\n## What To Do Next\n\n| Your Goal | Next Step |\n|-----------|-----------|\n| Run a quality convergence workflow | Try `/babysitter:call build a feature with 85% quality target` |\n| Build your own convergence loop | Copy the TDD example above and customize the scoring |\n| Add more quality gates | See the Five Quality Gate Categories section |\n| Debug a stuck convergence | Check [Best Practices - Debugging](./best-practices.md#debugging-and-troubleshooting) |\n| Understand the architecture | Read [Two-Loops Architecture](./two-loops-architecture.md) |\n\n---\n\n## Summary\n\nQuality convergence enables automated iterative improvement until defined quality targets are met. Combine quality scoring, feedback loops, and sensible iteration limits to ensure consistent, high-quality outputs. Use parallel execution for efficiency and breakpoints for human oversight at critical milestones.\n\n**Key Takeaways:**\n\n1. **Set realistic targets** - Start with 80-85, work up to 90+\n2. **Use multiple gate types** - Tests + lint + security + performance\n3. **Pass feedback between iterations** - AI learns from each failure\n4. **Detect plateaus early** - Don't waste iterations on no improvement\n5. **Parallelize independent checks** - Faster iterations mean faster convergence\n",
"documents": []
},
"outgoingEdges": [],
"incomingEdges": [
{
"from": "page:docs-user-guide-features",
"to": "page:docs-user-guide-features-quality-convergence",
"kind": "contains_page"
}
]
}