docs/user-guide/features/quality-convergence
Quality Convergence: Iterative Improvement Until Targets Met reference
Instead of:
Continue reading
Nearby pages in the same section.
Quality Convergence: Iterative Improvement Until Targets Met
**Version:** 2.1 **Last Updated:** 2026-01-26 **Category:** Feature Guide
---
Quick Summary (Read This First)
**Quality Convergence = "Keep trying until it's good enough"**
Instead of:
AI writes code → Tests fail → You manually fix → Tests fail again → Repeat 10xBabysitter does:
AI writes code → Tests: 60% pass → AI fixes → Tests: 85% pass → AI fixes → Tests: 95% pass ✓ Done!What You'll Learn in This Document
| Section | What It Covers | Read If You Want To... |
|---|---|---|
| Five Quality Gates | Types of checks (tests, lint, security, etc.) | Understand what gets checked |
| 90-Score Pattern | How to reliably hit high quality | Build production-ready workflows |
| Process Examples | Real code from the library | See working implementations |
| Step-by-Step | How to build your own | Create custom quality loops |
A Simple Example
Here's what quality convergence looks like in practice:
Iteration 1:
- AI writes login feature
- Tests run: 3/10 passing (30%)
- AI sees: "Missing password validation, no error handling"
Iteration 2:
- AI fixes based on feedback
- Tests run: 7/10 passing (70%)
- AI sees: "Edge case for empty email not handled"
Iteration 3:
- AI fixes edge cases
- Tests run: 10/10 passing (100%)
- Quality target met! ✓
Output: Working login feature with all tests passing**Key insight**: The AI doesn't just try once - it learns from each failure and improves.
Understanding Quality Scores
**Quality scores are multi-dimensional, not a single number.** This is what makes Babysitter's quality convergence so accurate - instead of a simple pass/fail, you get nuanced feedback across multiple dimensions that guide improvement.
A typical quality score includes:
| Dimension | What It Measures | Example |
|---|---|---|
| **Tests** | Pass rate and coverage | 92% tests passing, 85% coverage |
| **Code Quality** | Lint errors, complexity | 0 lint errors, complexity < 10 |
| **Security** | Vulnerabilities, secrets | 0 critical issues |
| **Performance** | Response time, bundle size | p95 < 500ms |
| **Type Safety** | Type errors, null safety | 0 type errors |
The Power of Custom Dimensions
**You define what quality means for your project.** The dimensions above are just examples - you can:
1. **Define your own 5 dimensions** that matter most for your domain 2. **Ask Babysitter to suggest dimensions** appropriate for your specific task 3. **Weight dimensions differently** based on project phase or criticality
For example, a data pipeline might use completely different dimensions:
| Dimension | Weight | Threshold |
|---|---|---|
| **Data Accuracy** | 30% | > 99.9% |
| **Processing Speed** | 25% | < 5 min/GB |
| **Schema Validation** | 20% | 100% valid |
| **Idempotency** | 15% | All operations idempotent |
| **Error Recovery** | 10% | Auto-recovery < 30s |
This flexibility means quality convergence adapts to any domain - from ML model training to infrastructure deployment to documentation generation.
**For detailed scoring formulas and weight configurations, see Best Practices - Custom Scoring Strategies.**
---
Overview
Quality convergence is an iterative improvement pattern where Babysitter repeatedly refines work until a defined quality target is achieved. Instead of executing a task once and hoping for the best, quality convergence loops through implementation, testing, and scoring cycles until the output meets your standards.
The Core Principle: Evidence-Driven Completion
From the Two-Loops Control Plane architecture, the fundamental principle is:
**If you don't have evidence, you don't have completion.**
*If you do only one thing: make completion require evidence.* — This single principle transforms "it seems done" into "it is done."
Every phase must end with:
- **Artifact**: The work product (patch, doc, config, report)
- **Evidence**: Proof that it meets requirements (logs, test output, checks)
Why Use Quality Convergence
- **Consistent Quality**: Guarantee outputs meet minimum quality thresholds
- **Automated Refinement**: Let the system iterate without manual intervention
- **Measurable Results**: Track quality scores across iterations
- **Predictable Outcomes**: Set clear targets and iteration limits
- **TDD Integration**: Combine with test-driven development for robust code
- **Evidence-Based Completion**: Every iteration produces verifiable proof of quality
---
The Five Quality Gate Categories
Quality gates are not a single check. They form a **layered validation system** that ensures completeness from multiple perspectives. For robust quality convergence, use **4-5 gate types simultaneously**.
Gate Type 1: Functional Tests (Unit/Integration/System/Acceptance)
Verifies the code behaves correctly across all levels.
// From: methodologies/v-model.js (V-Model process)
const testResults = await ctx.task(executeTestsTask, {
implementation,
unitTestDesigns, // Validates module design
integrationTestDesign, // Validates architecture
systemTestDesign, // Validates system design
acceptanceTestDesign // Validates requirements
});
const allTestsPassed =
testResults.unitTests.passed &&
testResults.integrationTests.passed &&
testResults.systemTests.passed &&
testResults.acceptanceTests.passed;**Gate Criteria:**
| Test Level | What It Validates | Typical Pass Threshold |
|---|---|---|
| Unit Tests | Individual functions/classes | 90-100% pass rate |
| Integration Tests | Module interactions | 95-100% pass rate |
| System Tests | End-to-end behavior | 90-100% pass rate |
| Acceptance Tests | User requirements | 100% for critical |
Gate Type 2: Code Quality (Lint/Format/Complexity)
Ensures code follows style guidelines and maintainability standards.
// Parallel code quality checks
const [lint, format, complexity] = await ctx.parallel.all([
() => ctx.task(lintTask, { files: impl.filesModified }),
() => ctx.task(formatCheckTask, { files: impl.filesModified }),
() => ctx.task(complexityTask, { files: impl.filesModified })
]);
const codeQualityGatePassed =
lint.errorCount === 0 &&
format.violations === 0 &&
complexity.maxCyclomaticComplexity < 10;**Gate Criteria:**
| Check | Tool Examples | Typical Threshold |
|---|---|---|
| Lint Errors | ESLint, Pylint | 0 errors |
| Formatting | Prettier, Black | 0 violations |
| Cyclomatic Complexity | SonarQube, Radon | < 10 per function |
| Code Duplication | jscpd, CPD | < 3% duplication |
Gate Type 3: Type Safety and Static Analysis
Catches bugs at compile/analysis time without running the code.
// From: gsd/iterative-convergence enhanced pattern
const [typeCheck, staticAnalysis] = await ctx.parallel.all([
() => ctx.task(typeCheckTask, { files: impl.filesModified }),
() => ctx.task(staticAnalysisTask, { files: impl.filesModified })
]);
const staticGatePassed =
typeCheck.errors.length === 0 &&
staticAnalysis.criticalIssues === 0 &&
staticAnalysis.highIssues === 0;**Gate Criteria:**
| Check | What It Catches | Typical Threshold |
|---|---|---|
| Type Checking | Type mismatches, null errors | 0 type errors |
| Static Analysis | Potential bugs, code smells | 0 critical/high issues |
| Dead Code | Unreachable statements | 0 dead code blocks |
| Null Safety | Potential null dereferences | 0 null warnings |
Gate Type 4: Security Scanning
Identifies vulnerabilities, secrets, and security anti-patterns.
// Security gate from methodologies/spec-driven-development.js
const security = await ctx.task(securityTask, {
files: impl.filesModified,
scanLevel: inputs.safetyLevel // 'standard' | 'high' | 'critical'
});
const securityGatePassed =
security.criticalVulnerabilities === 0 &&
security.highVulnerabilities === 0 &&
security.secretsDetected === 0 &&
security.dependencyVulnerabilities.critical === 0;**Gate Criteria:**
| Check | What It Scans | Typical Threshold |
|---|---|---|
| SAST (Static) | SQL injection, XSS, etc. | 0 critical/high |
| Secrets Detection | API keys, passwords | 0 secrets |
| Dependency Scan | Known CVEs in packages | 0 critical CVEs |
| OWASP Top 10 | Common web vulnerabilities | 0 violations |
Gate Type 5: Performance and Resource Thresholds
Ensures the implementation meets non-functional requirements.
// Performance gate for production readiness
const performance = await ctx.task(performanceCheckTask, {
implementation: impl,
thresholds: {
loadTimeMs: 1500, // First Contentful Paint
bundleSizeKb: 200, // Gzipped bundle
apiResponseP95Ms: 500, // 95th percentile
memoryUsageMb: 512 // Peak memory
}
});
const performanceGatePassed =
performance.fcp <= 1500 &&
performance.bundleSize <= 200 &&
performance.apiP95 <= 500 &&
performance.peakMemory <= 512;**Gate Criteria:**
| Metric | Typical Target | Domain |
|---|---|---|
| FCP (First Contentful Paint) | < 1.5s | Frontend |
| Bundle Size | < 200KB gzipped | Frontend |
| API p95 Response | < 500ms | Backend |
| Memory Usage | < 512MB | Server |
| CPU Utilization | < 70% average | Server |
---
The 90-Score Quality Convergence Pattern
To reliably achieve scores of **90+**, implement a **multi-gate weighted scoring system** with iterative feedback.
Step 1: Define Weighted Scoring Dimensions
// Recommended weights for high-quality convergence
const QUALITY_WEIGHTS = {
// For production features
production: {
tests: 0.25, // Test coverage and pass rate
implementation: 0.25, // Code correctness
codeQuality: 0.15, // Lint, complexity, formatting
security: 0.20, // Vulnerability scanning
performance: 0.15 // Non-functional requirements
},
// For security-critical systems
securityCritical: {
tests: 0.20,
implementation: 0.20,
codeQuality: 0.10,
security: 0.35, // Higher weight for security
performance: 0.15
},
// For performance-critical systems
performanceCritical: {
tests: 0.20,
implementation: 0.20,
codeQuality: 0.10,
security: 0.15,
performance: 0.35 // Higher weight for performance
}
};Step 2: Implement the Multi-Gate Convergence Loop
/**
* Multi-gate quality convergence targeting 90+ scores
* References: gsd/iterative-convergence.js, methodologies/spec-driven-development.js
*/
export async function process(inputs, ctx) {
const {
feature,
targetQuality = 90, // Target score
maxIterations = 10, // Allow more iterations for high targets
minImprovement = 2, // Minimum improvement per iteration
plateauThreshold = 3, // Iterations without improvement
weights = QUALITY_WEIGHTS.production
} = inputs;
let iteration = 0;
let quality = 0;
const iterationHistory = [];
while (iteration < maxIterations && quality < targetQuality) {
iteration++;
ctx.log(`[Iteration ${iteration}/${maxIterations}] Target: ${targetQuality}`);
// ===== ACT: Implement with feedback from previous iteration =====
const previousFeedback = iteration > 1
? iterationHistory[iteration - 2].recommendations
: null;
const impl = await ctx.task(implementTask, {
feature,
iteration,
previousFeedback,
focusAreas: previousFeedback?.slice(0, 3) // Top 3 priorities
});
// ===== VALIDATE: Run all five quality gates in parallel =====
const [tests, codeQuality, staticAnalysis, security, performance] =
await ctx.parallel.all([
() => ctx.task(testGateTask, { impl }),
() => ctx.task(codeQualityGateTask, { impl }),
() => ctx.task(staticAnalysisGateTask, { impl }),
() => ctx.task(securityGateTask, { impl }),
() => ctx.task(performanceGateTask, { impl })
]);
// ===== SCORE: Calculate weighted quality score =====
const scores = {
tests: tests.score,
implementation: calculateImplementationScore(impl, tests),
codeQuality: codeQuality.score,
security: security.score,
performance: performance.score
};
quality = Object.entries(weights).reduce(
(total, [dimension, weight]) => total + (scores[dimension] * weight),
0
);
// ===== ANALYZE: Generate prioritized recommendations =====
const recommendations = generateRecommendations(scores, weights, targetQuality);
iterationHistory.push({
iteration,
quality,
scores,
recommendations,
gates: { tests, codeQuality, staticAnalysis, security, performance }
});
ctx.log(`Quality: ${quality.toFixed(1)}/${targetQuality} | ` +
`Tests: ${scores.tests} | Code: ${scores.codeQuality} | ` +
`Security: ${scores.security} | Perf: ${scores.performance}`);
// ===== EARLY EXIT: Detect plateau =====
if (iteration >= plateauThreshold) {
const recent = iterationHistory.slice(-plateauThreshold).map(r => r.quality);
const improvement = Math.max(...recent) - Math.min(...recent);
if (improvement < minImprovement) {
ctx.log(`Quality plateaued at ${quality.toFixed(1)}, stopping early`);
break;
}
}
// ===== BREAKPOINT: At key thresholds =====
const converged = quality >= targetQuality;
if (!converged && quality >= 80 && iteration > 1) {
await ctx.breakpoint({
question: `Quality at ${quality.toFixed(1)}. Continue toward ${targetQuality}?`,
title: `Iteration ${iteration} Checkpoint`,
context: {
runId: ctx.runId,
files: [{ path: `artifacts/iteration-${iteration}-report.md`, format: 'markdown' }]
}
});
}
}
// ===== FINAL VALIDATION =====
const converged = quality >= targetQuality;
return {
success: converged,
quality,
targetQuality,
iterations: iteration,
iterationHistory,
finalGates: iterationHistory[iterationHistory.length - 1].gates,
metadata: { processId: 'quality-convergence-90', timestamp: ctx.now() }
};
}
function generateRecommendations(scores, weights, target) {
// Calculate gap for each dimension
const gaps = Object.entries(scores).map(([dim, score]) => ({
dimension: dim,
score,
weight: weights[dim],
weightedGap: (100 - score) * weights[dim],
priority: (100 - score) * weights[dim] // Higher weighted gap = higher priority
}));
// Sort by priority (highest impact improvements first)
return gaps
.sort((a, b) => b.priority - a.priority)
.map(g => `Improve ${g.dimension}: currently ${g.score}, ` +
`contributes ${(g.weight * g.score).toFixed(1)} of ${(g.weight * 100).toFixed(1)} possible`);
}Step 3: Progressive Target Strategy
For challenging targets (90+), use progressive escalation:
// Progressive targets that increase as iterations proceed
const progressiveTargets = [
{ iteration: 1, target: 70 }, // First: basic functionality
{ iteration: 3, target: 80 }, // Mid: solid implementation
{ iteration: 5, target: 85 }, // Late: polish and edge cases
{ iteration: 7, target: 90 } // Final: production ready
];
function getCurrentTarget(iteration, finalTarget) {
const applicable = progressiveTargets.filter(t => t.iteration <= iteration);
const progressiveTarget = applicable[applicable.length - 1]?.target || 70;
return Math.min(progressiveTarget, finalTarget);
}---
Real-World Process Examples
Example 1: V-Model with Four Test Levels
The V-Model process (methodologies/v-model.js) implements comprehensive quality gates:
/babysitter:call use the V-Model methodology to build a user authentication system with high safety levelOr with more detail:
/babysitter:call implement user authentication using V-Model with traceability and thorough testing**Quality Gates in V-Model:** 1. Requirements → Acceptance Tests (validates user needs) 2. System Design → System Tests (validates architecture) 3. Module Design → Integration Tests (validates interfaces) 4. Implementation → Unit Tests (validates code) 5. Traceability Matrix (validates coverage)
Example 2: Spec-Kit with Constitution Validation
The Spec-Kit process (methodologies/spec-driven-development.js) adds governance gates:
/babysitter:call use spec-driven development to build PCI-compliant payment processingOr:
/babysitter:call build a payment flow using the spec-driven methodology with governance validation**Quality Gates in Spec-Kit:** 1. Constitution Validation (governance principles) 2. Specification Review (requirements completeness) 3. Plan-Constitution Alignment (architecture compliance) 4. Task Consistency Analysis (cross-artifact validation) 5. Implementation Checklists ("unit tests for English") 6. User Story Validation (final acceptance)
Example 3: GSD Iterative Convergence
The GSD process (gsd/iterative-convergence.js) implements feedback-driven convergence:
/babysitter:call build a shopping cart checkout flow with 90% quality targetOr:
/babysitter:call implement checkout flow using iterative convergence with max 8 iterations**Quality Gates in GSD:** 1. Implementation scoring 2. Test execution 3. Quality assessment with recommendations 4. Iterative feedback loop
---
Use Cases and Scenarios
Scenario 1: TDD Feature Development
Build a feature with test-driven development, iterating until test coverage and quality targets are met.
export async function process(inputs, ctx) {
const { feature, targetQuality = 85, maxIterations = 5 } = inputs;
let iteration = 0;
let quality = 0;
while (iteration < maxIterations && quality < targetQuality) {
iteration++;
ctx.log(`[Iteration ${iteration}/${maxIterations}] Starting TDD implementation...`);
// Write tests first
const tests = await ctx.task(writeTestsTask, { feature, iteration });
// Implement code to pass tests
const impl = await ctx.task(implementTask, { tests, feature });
// Run quality checks
const [coverage, lint, security] = await ctx.parallel.all([
() => ctx.task(coverageTask, {}),
() => ctx.task(lintTask, {}),
() => ctx.task(securityTask, {})
]);
// Agent scores quality
const score = await ctx.task(agentScoringTask, {
tests, impl, coverage, lint, security
});
quality = score.overall;
ctx.log(`Quality score: ${quality}/${targetQuality}`);
}
return { converged: quality >= targetQuality, iterations: iteration, quality };
}Scenario 2: Code Quality Improvement
Iteratively improve existing code until it meets quality standards.
export async function process(inputs, ctx) {
const { files, targetScore = 90, maxIterations = 10 } = inputs;
let iteration = 0;
let currentScore = 0;
// Initial assessment
currentScore = await ctx.task(assessQualityTask, { files });
ctx.log(`Initial quality score: ${currentScore}`);
while (iteration < maxIterations && currentScore < targetScore) {
iteration++;
// Identify improvements
const improvements = await ctx.task(identifyImprovementsTask, {
files,
currentScore,
targetScore
});
// Apply improvements
await ctx.task(applyImprovementsTask, { improvements });
// Re-assess
currentScore = await ctx.task(assessQualityTask, { files });
ctx.log(`Iteration ${iteration}: Quality score ${currentScore}/${targetScore}`);
}
return { achieved: currentScore >= targetScore, finalScore: currentScore };
}Scenario 3: Documentation Generation
Generate documentation and refine until it meets completeness standards.
export async function process(inputs, ctx) {
const { codebase, targetCompleteness = 80, maxIterations = 3 } = inputs;
let iteration = 0;
let completeness = 0;
while (iteration < maxIterations && completeness < targetCompleteness) {
iteration++;
// Generate or improve documentation
await ctx.task(generateDocsTask, { codebase, iteration });
// Assess completeness
const assessment = await ctx.task(assessDocsCompletenessTask, { codebase });
completeness = assessment.completenessScore;
ctx.log(`Documentation completeness: ${completeness}%`);
}
return { complete: completeness >= targetCompleteness, completeness };
}---
Step-by-Step Instructions
Step 1: Define Quality Targets
Determine what quality means for your use case.
**Common quality metrics:**
- Test coverage percentage (e.g., 85%)
- Lint error count (e.g., 0 errors)
- Security vulnerability count (e.g., 0 critical)
- Overall quality score (e.g., 90/100)
Step 2: Set Iteration Limits
Prevent infinite loops by setting a maximum number of iterations.
const { targetQuality = 85, maxIterations = 5 } = inputs;**Recommendations:**
- Simple improvements: 3-5 iterations
- Complex refactoring: 5-10 iterations
- Large features: 10-15 iterations
Step 3: Implement the Convergence Loop
Create a loop that continues until the target is met or iterations are exhausted.
let iteration = 0;
let quality = 0;
while (iteration < maxIterations && quality < targetQuality) {
iteration++;
// Perform work
// ...
// Measure quality
quality = await measureQuality();
ctx.log(`Iteration ${iteration}: ${quality}/${targetQuality}`);
}Step 4: Implement Quality Scoring
Create a task that evaluates quality based on your criteria.
export const agentQualityScoringTask = defineTask('quality-scorer', (args, taskCtx) => ({
kind: 'agent',
title: 'Score implementation quality',
agent: {
name: 'quality-assessor',
prompt: {
role: 'senior quality assurance engineer',
task: 'Analyze implementation quality and provide a score from 0-100',
context: {
tests: args.tests,
implementation: args.implementation,
coverage: args.coverage,
lint: args.lint,
security: args.security
},
instructions: [
'Review test quality (weight: 25%)',
'Review implementation quality (weight: 30%)',
'Review code metrics (weight: 20%)',
'Review security (weight: 15%)',
'Review alignment with requirements (weight: 10%)',
'Provide recommendations for improvement'
]
}
}
}));Step 5: Add Feedback to Subsequent Iterations
Pass quality feedback to the next iteration to guide improvements.
const iterationResults = [];
while (iteration < maxIterations && quality < targetQuality) {
iteration++;
const previousFeedback = iteration > 1
? iterationResults[iteration - 2].recommendations
: null;
const impl = await ctx.task(implementTask, {
feature,
previousFeedback // Guide improvements based on previous scoring
});
const score = await ctx.task(agentScoringTask, { impl });
iterationResults.push({
iteration,
quality: score.overall,
recommendations: score.recommendations
});
quality = score.overall;
}---
Configuration Options
Quality Target Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
targetQuality | number | 85 | Target quality score (0-100) |
maxIterations | number | 5 | Maximum number of iterations before stopping |
Scoring Weights Configuration
Customize how different aspects contribute to the overall score.
const scoringWeights = {
tests: 0.25, // 25% weight for test quality
implementation: 0.30, // 30% weight for implementation quality
codeQuality: 0.20, // 20% weight for code metrics
security: 0.15, // 15% weight for security
alignment: 0.10 // 10% weight for requirements alignment
};Early Exit Conditions
Configure conditions that stop iteration early.
// Stop if quality plateaus (no improvement in last N iterations)
if (qualityHistory.length >= 3) {
const lastThree = qualityHistory.slice(-3);
const improvement = lastThree[2] - lastThree[0];
if (improvement < 1) {
ctx.log('Quality plateaued, stopping early');
break;
}
}---
Code Examples and Best Practices
Example 1: Full TDD Quality Convergence Process
Complete process definition demonstrating all quality convergence patterns.
export async function process(inputs, ctx) {
const {
feature = 'User authentication',
targetQuality = 85,
maxIterations = 5
} = inputs;
// Phase 1: Planning
const plan = await ctx.task(agentPlanningTask, { feature });
await ctx.breakpoint({
question: `Review the plan for "${feature}". Approve to proceed?`,
title: 'Plan Review',
context: { runId: ctx.runId, files: [{ path: 'artifacts/plan.md', format: 'markdown' }] }
});
// Phase 2: Quality Convergence Loop
let iteration = 0;
let quality = 0;
const iterationResults = [];
while (iteration < maxIterations && quality < targetQuality) {
iteration++;
ctx.log(`[Iteration ${iteration}/${maxIterations}]`);
// TDD: Write tests first
const tests = await ctx.task(writeTestsTask, {
feature,
plan,
iteration,
previousFeedback: iteration > 1 ? iterationResults[iteration - 2].feedback : null
});
// Run tests (expect failures on first iteration)
await ctx.task(runTestsTask, { testFiles: tests.testFiles, expectFailures: iteration === 1 });
// Implement to pass tests
const impl = await ctx.task(implementTask, {
feature,
tests,
iteration,
previousFeedback: iteration > 1 ? iterationResults[iteration - 2].feedback : null
});
// Run tests again
const testResults = await ctx.task(runTestsTask, { testFiles: tests.testFiles });
// Parallel quality checks
const [coverage, lint, typeCheck, security] = await ctx.parallel.all([
() => ctx.task(coverageTask, {}),
() => ctx.task(lintTask, { files: impl.filesModified }),
() => ctx.task(typeCheckTask, { files: impl.filesModified }),
() => ctx.task(securityTask, { files: impl.filesModified })
]);
// Agent quality scoring
const score = await ctx.task(agentQualityScoringTask, {
tests,
testResults,
implementation: impl,
qualityChecks: { coverage, lint, typeCheck, security },
iteration,
targetQuality
});
quality = score.overallScore;
iterationResults.push({
iteration,
quality,
feedback: score.recommendations
});
ctx.log(`Quality: ${quality}/${targetQuality}`);
if (quality >= targetQuality) {
ctx.log('Target quality achieved!');
}
}
// Final approval
await ctx.breakpoint({
question: `Quality: ${quality}/${targetQuality}. Approve for merge?`,
title: 'Final Review',
context: { runId: ctx.runId, files: [{ path: 'artifacts/final-report.md', format: 'markdown' }] }
});
return {
success: quality >= targetQuality,
iterations: iteration,
finalQuality: quality,
iterationResults
};
}Example 2: Quality Scoring Task Definition
export const agentQualityScoringTask = defineTask('quality-scorer', (args, taskCtx) => ({
kind: 'agent',
title: `Score quality (iteration ${args.iteration})`,
description: 'Comprehensive quality assessment with agent',
agent: {
name: 'quality-assessor',
prompt: {
role: 'senior quality assurance engineer and code reviewer',
task: 'Analyze implementation quality across multiple dimensions',
context: {
feature: args.feature,
tests: args.tests,
testResults: args.testResults,
implementation: args.implementation,
qualityChecks: args.qualityChecks,
iteration: args.iteration,
targetQuality: args.targetQuality
},
instructions: [
'Review test quality: coverage, edge cases, assertions (weight: 25%)',
'Review implementation quality: correctness, readability (weight: 30%)',
'Review code metrics: lint, types, complexity (weight: 20%)',
'Review security: vulnerabilities, input validation (weight: 15%)',
'Review requirements alignment (weight: 10%)',
'Calculate weighted overall score (0-100)',
'Provide prioritized recommendations for improvement'
],
outputFormat: 'JSON with overallScore, scores by dimension, recommendations'
},
outputSchema: {
type: 'object',
required: ['overallScore', 'scores', 'recommendations'],
properties: {
overallScore: { type: 'number', minimum: 0, maximum: 100 },
scores: {
type: 'object',
properties: {
tests: { type: 'number' },
implementation: { type: 'number' },
codeQuality: { type: 'number' },
security: { type: 'number' },
alignment: { type: 'number' }
}
},
recommendations: { type: 'array', items: { type: 'string' } }
}
}
},
io: {
inputJsonPath: `tasks/${taskCtx.effectId}/input.json`,
outputJsonPath: `tasks/${taskCtx.effectId}/result.json`
}
}));Best Practices
1. **Set Realistic Targets**: Aim for achievable quality scores (80-90% is often reasonable) 2. **Limit Iterations**: Prevent runaway loops with sensible limits (5-10 iterations typically) 3. **Use Parallel Checks**: Run independent quality checks concurrently for efficiency 4. **Provide Feedback**: Pass recommendations from scoring to subsequent iterations 5. **Log Progress**: Track quality scores across iterations for visibility 6. **Include Breakpoints**: Add approval gates at key milestones
---
Common Pitfalls and Troubleshooting
Pitfall 1: Quality Score Not Improving
**Symptom:**
Iteration 1: Quality 65/100
Iteration 2: Quality 66/100
Iteration 3: Quality 65/100
Iteration 4: Quality 67/100
Iteration 5: Quality 66/100
Target not met: 85/100**Causes:**
- Quality target is unrealistic for the codebase
- Scoring criteria are too strict
- Fundamental issues blocking improvement
**Solutions:**
1. Review iteration feedback to identify blocking issues: `` What recommendations came from my quality scoring? ``
2. Adjust quality target: ``javascript const { targetQuality = 75 } = inputs; // Lower target ``
3. Increase iteration limit: ``javascript const { maxIterations = 10 } = inputs; // More iterations ``
4. Review scoring weights for balance
Pitfall 2: Too Many Iterations
**Symptom:** Process runs for many iterations before converging.
**Cause:** Target is too high or improvements are too granular.
**Solutions:**
1. Implement early exit on plateau: ``javascript const recentScores = iterationResults.slice(-3).map(r => r.quality); if (Math.max(...recentScores) - Math.min(...recentScores) < 2) { ctx.log('Quality plateaued, stopping early'); break; } ``
2. Increase improvement scope per iteration
3. Lower quality target to realistic level
Pitfall 3: Inconsistent Quality Scores
**Symptom:** Quality scores vary significantly between iterations without clear reason.
**Cause:** Non-deterministic scoring or external factors.
**Solution:**
1. Use deterministic scoring criteria 2. Ensure ctx.now() is used instead of Date.now() for timestamps 3. Review agent scoring prompts for consistency
Pitfall 4: Iteration Takes Too Long
**Symptom:** Each iteration takes several minutes.
**Cause:** Sequential execution of independent tasks.
**Solution:** Use parallel execution:
// Slow: Sequential
const coverage = await ctx.task(coverageTask, {});
const lint = await ctx.task(lintTask, {});
const security = await ctx.task(securityTask, {});
// Fast: Parallel
const [coverage, lint, security] = await ctx.parallel.all([
() => ctx.task(coverageTask, {}),
() => ctx.task(lintTask, {}),
() => ctx.task(securityTask, {})
]);---
Related Documentation
- Process Definitions - Learn to create quality convergence processes
- Parallel Execution - Optimize quality checks with parallelism
- Breakpoints - Add approval gates to quality convergence workflows
- Best Practices - Patterns for setting targets, custom scoring strategies, and balancing speed vs thoroughness
- Process Library - Browse the SDK-managed library and current process counts
- Two-Loops Architecture - Deep dive into the evidence-driven completion model
---
Try Different Methodologies and Processes
Babysitter offers two levels of reusable workflows:
Methodologies (38 directories in this repo snapshot) - The "How"
**Quality convergence works with ANY of Babysitter's methodology families** - not just TDD. In this repository snapshot there are 38 methodology directories under library/methodologies/.
| Methodology | Best For | Quality Focus |
|---|---|---|
| **TDD Quality Convergence** | Test-first development | Test coverage, regression prevention |
| **GSD (Get Stuff Done)** | Rapid prototyping | Working software, iteration speed |
| **Spec-Kit** | Enterprise/governance | Specification compliance, audit trails |
| **BDD/Specification by Example** | Team collaboration | Acceptance criteria, living documentation |
| **Domain-Driven Design** | Complex business domains | Domain model integrity, bounded contexts |
**Browse methodologies:**
Domain Processes - The "What"
Beyond methodologies, Babysitter includes the following generated specialization snapshot from the live repository tree:
<!-- quality-convergence:domains:start -->
| Domain | Processes | Examples |
|---|---|---|
| **Development and technical specializations** | 837 | Web APIs, mobile apps, DevOps pipelines, AI, security, and related technical workflows |
| **Business domains** | 490 | Legal contracts, HR workflows, marketing campaigns, finance, logistics, and related domains |
| **Science & engineering domains** | 551 | Quantum algorithms, aerospace systems, biomedical devices, mathematics, and related domains |
| **Social sciences & humanities** | 160 | Education, healthcare, arts, philosophy, and social-science research |
<!-- quality-convergence:domains:end -->
**Browse processes:**
- Process Library - Full catalog with descriptions
- Specializations folder
---
What To Do Next
| Your Goal | Next Step |
|---|---|
| Run a quality convergence workflow | Try /babysitter:call build a feature with 85% quality target |
| Build your own convergence loop | Copy the TDD example above and customize the scoring |
| Add more quality gates | See the Five Quality Gate Categories section |
| Debug a stuck convergence | Check Best Practices - Debugging |
| Understand the architecture | Read Two-Loops Architecture |
---
Summary
Quality convergence enables automated iterative improvement until defined quality targets are met. Combine quality scoring, feedback loops, and sensible iteration limits to ensure consistent, high-quality outputs. Use parallel execution for efficiency and breakpoints for human oversight at critical milestones.
**Key Takeaways:**
1. **Set realistic targets** - Start with 80-85, work up to 90+ 2. **Use multiple gate types** - Tests + lint + security + performance 3. **Pass feedback between iterations** - AI learns from each failure 4. **Detect plateaus early** - Don't waste iterations on no improvement 5. **Parallelize independent checks** - Faster iterations mean faster convergence