iiRecord
Agentic AI Atlas · Quality Gates
page:docs-testing-quality-gatesa5c.ai
II.
Page JSON

page:docs-testing-quality-gates

Structured · live

Quality Gates json

Inspect the normalized record payload exactly as the atlas UI reads it.

File · wiki/docs/testing/quality-gates.mdCluster · wiki
Record JSON
{
  "id": "page:docs-testing-quality-gates",
  "_kind": "Page",
  "_file": "wiki/docs/testing/quality-gates.md",
  "_cluster": "wiki",
  "attributes": {
    "nodeKind": "Page",
    "sourcePath": "docs/testing/quality-gates.md",
    "sourceKind": "repo-docs",
    "title": "Quality Gates",
    "displayName": "Quality Gates",
    "slug": "docs/testing/quality-gates",
    "articlePath": "wiki/docs/testing/quality-gates.md",
    "article": "\n# Quality Gates\n\nThese gates define what must be true before a new test lane, workflow, or model-backed scenario is treated as release evidence.\n\n## Gate Matrix\n\n| Gate | Applies to | Required checks | Failure action |\n| --- | --- | --- | --- |\n| Determinism | No-model tests | No provider secrets, fixed fixtures, repeatable locally, stable timeout budget | Block PR until deterministic |\n| Credential guard | Model-backed tests | Explicit secret detection before setup, clear skip reason, no fallback to fake success | Block staging/release if selected job cannot prove setup |\n| Artifact redaction | All E2E tests | Secret scan over logs/artifacts, redacted paths, no raw token files | Fail job and suppress unsafe upload |\n| Protocol compatibility | Mux tests | Mock and live event streams satisfy the same schema/version | Open compatibility issue before promotion |\n| Transport-mux seam evidence | Transport-mux tests | Route matrix, runtime env injection, proxy auth, launch proxy decision, stream transcript, metrics/cache artifact, and invalid-combination boundaries are explicit | Block transport-mux coverage promotion until the missing seam has a direct artifact |\n| Runtime completeness | Babysitter-agent E2E | Run creation, session binding, effect emission, task post, terminal state | Block runtime release gate |\n| Cost and flake budget | Model-backed tests | Retry policy, duration budget, provider rate-limit classification | Keep scheduled/manual until stable |\n| Documentation parity | All lanes | Docs name command, owner, trigger, artifacts, skip/failure semantics | Block workflow merge |\n\n## Adversarial Review Checklist\n\nEvery implementation phase should answer these questions before it is accepted:\n\n- What would make this pass without testing the promised behavior?\n- Which secret or credential path could leak into logs?\n- Which mock assumption could diverge from live Codex or Claude Code behavior?\n- Which package boundary is only tested indirectly?\n- Did transport-mux traffic actually use proxy routes and injected env, or did the harness call the provider directly?\n- Is this test accidentally proving plugin install, harness install, hooks, or Babysitter journal behavior with transport-mux evidence only?\n- Which failure would be misclassified as provider flake instead of product regression?\n- Which CI trigger would run too often, too late, or not at all?\n- Which artifact proves the claim to a reviewer who did not watch the run?\n\n## Promotion Criteria\n\nA test can move from manual to scheduled when it has three consecutive successful runs or one documented provider-side skip with no product failures.\n\nA test can move from scheduled to staging preflight when:\n\n- it has stable credential gating,\n- it emits redacted artifacts,\n- transport-mux bridge tests include launch-plan JSON, redacted proxy config/env diff, route or stream transcript, metrics/cache snapshot, and provider/harness version metadata when they claim proxy coverage,\n- it adds unique evidence not already covered by no-model tests,\n- it has an owner for failures,\n- it has a bounded runtime and retry policy.\n\nA test can move from staging preflight to release preflight only when it protects a production publish risk that cannot be caught earlier.\n\n## Quarantine And Demotion\n\nModel-backed tests are allowed to start outside required branch protection. They must be demoted or quarantined when reliability falls below release-gate quality.\n\n| Condition | Action |\n| --- | --- |\n| Two provider-infra failures in seven days | Keep scheduled, remove from required staging checks until root cause is classified |\n| One product regression in staging preflight | Keep required and block publish until fixed or explicitly waived |\n| Secret redaction failure | Disable artifact upload for that lane and block promotion until redaction test exists |\n| Runtime exceeds hard timeout twice | Move to manual diagnostics until scope or timeout budget is redesigned |\n| Mock/live schema drift | Block promotion and open a compatibility issue naming the event family |\n\nA quarantined test can return to required status after three consecutive clean scheduled runs and one clean manual rerun by the owning maintainer.\n",
    "documents": []
  },
  "outgoingEdges": [],
  "incomingEdges": [
    {
      "from": "page:docs-testing",
      "to": "page:docs-testing-quality-gates",
      "kind": "contains_page"
    }
  ]
}