Agentic AI Atlasby a5c.ai
OverviewWikiGraphFor AgentsEdgesSearchWorkspace
/
GitHubDocsDiscord
i.3Wiki
Agentic AI Atlas · Quality Gates
docs/testing/quality-gatesa5c.ai
Search the atlas/
Wiki · linked records

Article and nearby pages

I.Current articlepp. 1 - 1
Agent Mux And Runtime E2ECoverage And ReportingCurrent Test Command InventoryHarness And Plugin E2EImplementation RoadmapMock And Fixture Contracts
I.
Wiki article

docs/testing/quality-gates

Reading · 4 min

Quality Gates reference

These gates define what must be true before a new test lane, workflow, or model-backed scenario is treated as release evidence.

Page nodewiki/docs/testing/quality-gates.mdNearby pages · 11Documents · 0

Continue reading

Nearby pages in the same section.

Agent Mux And Runtime E2ECoverage And ReportingCurrent Test Command InventoryHarness And Plugin E2EImplementation RoadmapMock And Fixture ContractsPipeline IntegrationPrimary Flow Data PathsStack PermutationsTest LanesTrace Identifiers And Evidence

Quality Gates

These gates define what must be true before a new test lane, workflow, or model-backed scenario is treated as release evidence.

Gate Matrix

GateApplies toRequired checksFailure action
DeterminismNo-model testsNo provider secrets, fixed fixtures, repeatable locally, stable timeout budgetBlock PR until deterministic
Credential guardModel-backed testsExplicit secret detection before setup, clear skip reason, no fallback to fake successBlock staging/release if selected job cannot prove setup
Artifact redactionAll E2E testsSecret scan over logs/artifacts, redacted paths, no raw token filesFail job and suppress unsafe upload
Protocol compatibilityMux testsMock and live event streams satisfy the same schema/versionOpen compatibility issue before promotion
Transport-mux seam evidenceTransport-mux testsRoute matrix, runtime env injection, proxy auth, launch proxy decision, stream transcript, metrics/cache artifact, and invalid-combination boundaries are explicitBlock transport-mux coverage promotion until the missing seam has a direct artifact
Runtime completenessBabysitter-agent E2ERun creation, session binding, effect emission, task post, terminal stateBlock runtime release gate
Cost and flake budgetModel-backed testsRetry policy, duration budget, provider rate-limit classificationKeep scheduled/manual until stable
Documentation parityAll lanesDocs name command, owner, trigger, artifacts, skip/failure semanticsBlock workflow merge

Adversarial Review Checklist

Every implementation phase should answer these questions before it is accepted:

  • What would make this pass without testing the promised behavior?
  • Which secret or credential path could leak into logs?
  • Which mock assumption could diverge from live Codex or Claude Code behavior?
  • Which package boundary is only tested indirectly?
  • Did transport-mux traffic actually use proxy routes and injected env, or did the harness call the provider directly?
  • Is this test accidentally proving plugin install, harness install, hooks, or Babysitter journal behavior with transport-mux evidence only?
  • Which failure would be misclassified as provider flake instead of product regression?
  • Which CI trigger would run too often, too late, or not at all?
  • Which artifact proves the claim to a reviewer who did not watch the run?

Promotion Criteria

A test can move from manual to scheduled when it has three consecutive successful runs or one documented provider-side skip with no product failures.

A test can move from scheduled to staging preflight when:

  • it has stable credential gating,
  • it emits redacted artifacts,
  • transport-mux bridge tests include launch-plan JSON, redacted proxy config/env diff, route or stream transcript, metrics/cache snapshot, and provider/harness version metadata when they claim proxy coverage,
  • it adds unique evidence not already covered by no-model tests,
  • it has an owner for failures,
  • it has a bounded runtime and retry policy.

A test can move from staging preflight to release preflight only when it protects a production publish risk that cannot be caught earlier.

Quarantine And Demotion

Model-backed tests are allowed to start outside required branch protection. They must be demoted or quarantined when reliability falls below release-gate quality.

ConditionAction
Two provider-infra failures in seven daysKeep scheduled, remove from required staging checks until root cause is classified
One product regression in staging preflightKeep required and block publish until fixed or explicitly waived
Secret redaction failureDisable artifact upload for that lane and block promotion until redaction test exists
Runtime exceeds hard timeout twiceMove to manual diagnostics until scope or timeout budget is redesigned
Mock/live schema driftBlock promotion and open a compatibility issue naming the event family

A quarantined test can return to required status after three consecutive clean scheduled runs and one clean manual rerun by the owning maintainer.

Trail

Wiki
Babysitter Docs
Testing Strategy

Quality Gates

Continue reading

Agent Mux And Runtime E2E
Coverage And Reporting
Current Test Command Inventory
Harness And Plugin E2E
Implementation Roadmap
Mock And Fixture Contracts
Pipeline Integration
Primary Flow Data Paths

Page record

Open node ledger

wiki/docs/testing/quality-gates.md

Documents

No documented graph nodes on this page.