II.
Page JSON
Structured · livepage:docs-testing-coverage-and-reporting
Coverage And Reporting json
Inspect the normalized record payload exactly as the atlas UI reads it.
{
"id": "page:docs-testing-coverage-and-reporting",
"_kind": "Page",
"_file": "wiki/docs/testing/coverage-and-reporting.md",
"_cluster": "wiki",
"attributes": {
"nodeKind": "Page",
"sourcePath": "docs/testing/coverage-and-reporting.md",
"sourceKind": "repo-docs",
"title": "Coverage And Reporting",
"displayName": "Coverage And Reporting",
"slug": "docs/testing/coverage-and-reporting",
"articlePath": "wiki/docs/testing/coverage-and-reporting.md",
"article": "\n# Coverage And Reporting\n\nCoverage reporting should make the repository-wide test story visible without turning every test into one slow monolithic gate.\n\n## Coverage Targets\n\n| Layer | Coverage mechanism | Reporting expectation |\n| --- | --- | --- |\n| Unit and contract tests | Vitest coverage per package | Package coverage reports uploaded as artifacts and merged into a repo summary |\n| Browser E2E | Playwright traces and screenshots on failure | Trace artifact plus scenario summary, not line coverage as the primary signal |\n| CLI and harness tests | Command transcript and JSON result assertions | Command matrix with pass/fail, duration, and installed version metadata |\n| Model-backed tests | Redacted run logs and event assertions | Provider/harness matrix with credential gate status, model name, duration, and token/usage metadata when safe |\n| Docs and generated assets | Existing docs QA and generator checks | Docs QA artifact plus generated-output diff/compare result |\n\n## Whole-Repo Coverage Report\n\nThe long-term target is one repository coverage artifact with package-level sections:\n\n- `@a5c-ai/babysitter-sdk`,\n- `@a5c-ai/babysitter-agent`,\n- `@a5c-ai/agent-core`,\n- `@a5c-ai/transport-mux`,\n- the `@a5c-ai/agent-mux` package family,\n- the hooks-mux package family,\n- `@a5c-ai/agent-plugins-mux`,\n- `@a5c-ai/cloud`,\n- docs/generator checks.\n\nVitest coverage should remain package-local during execution, then a dedicated reporting job can merge summaries. Playwright traces and model-backed artifacts should be linked from the same summary but should not be converted into misleading line coverage.\n\n## Minimum Evidence Per Lane\n\n| Lane | Minimum evidence |\n| --- | --- |\n| No-model | Command, package/workspace, test file count, assertion count when available, coverage summary when enabled |\n| Model-backed | Command, harness/provider, installed versions, credential gate result, model name or backend, redacted final output, event assertions |\n| Pipeline gate | Workflow run ID, job name, commit SHA, branch, artifact names, pass/fail or skip reason |\n| Release gate | All pipeline evidence plus package versions and publish/deploy dependency that consumed the evidence |\n\n## Reporting Rules\n\n- A green model-backed lane must prove that at least one real provider call or real harness call happened.\n- A skipped model-backed lane must be visibly skipped before setup, with a single missing dependency named.\n- A no-model lane must not depend on external provider availability.\n- A coverage summary must distinguish line coverage from E2E scenario coverage.\n- A release gate must link to the exact artifacts that support the publish decision.\n\n## Implementation Sequence\n\n1. Normalize package-local Vitest coverage output.\n2. Add a `coverage/no-model` artifact for deterministic tests.\n3. Add Playwright trace artifacts for UI E2E failures.\n4. Add model-backed matrix artifacts with redacted logs.\n5. Add a merged markdown summary that CI attaches to workflow summaries and release candidates.\n\n## Threshold Policy\n\nInitial thresholds should be conservative and package-local. Raising a threshold requires a passing baseline report from the package owner.\n\n| Metric | Initial policy | Blocks merge? |\n| --- | --- | --- |\n| Package line coverage | Do not decrease by more than 2 percentage points from baseline | Yes for no-model PR checks |\n| Contract fixture coverage | Every committed fixture family has at least one parser/secret-scan test | Yes |\n| Playwright scenario count | Scenario count may not drop unless a test is renamed or removed with docs | Yes for UI-owned PRs |\n| Model-backed success rate | Three consecutive scheduled successes before staging promotion | No for PRs; yes after staging promotion |\n| Runtime duration | Warn at 80 percent of budget, fail at hard timeout | Yes for required lanes |\n| Flake rate | More than two infra-classified failures in seven days triggers quarantine | No during quarantine; yes after promotion |\n\nTrend-only metrics include token usage, provider latency, UI trace size, and artifact size. They should be shown in summaries but should not block merges until a maintainer intentionally promotes a threshold.\n\nScenario coverage is separate from line coverage. It tracks whether critical user-visible flows have at least one no-model proof and, where needed, one live proof.\n\n| Scenario | No-model proof | Live proof |\n| --- | --- | --- |\n| Codex SDK setup | Dry-run harness/plugin installer JSON | Capability-gated live setup or documented skip; do not claim agent-mux plugin-manager support unless adapter capability allows it |\n| Claude Code SDK setup | Dry-run harness/plugin installer JSON | Live setup artifact plus installed plugin manifest where selected |\n| Agent-mux adapter/session protocol | Fixture transcript through adapter tests | Live Codex/Claude session event comparison via `amux run` or SDK `createClient().run` |\n| Transport-mux route/runtime bridge | Local route matrix, env injection, launch-plan proxy decisions, fixture stream, passthrough, metrics/cache, and cancellation tests | Live agent-core stream through transport plus agent-mux-launched external harness proxy stream with redacted launch/env/metrics artifacts |\n| Babysitter-agent runtime orchestration | Mock planner/executor run journal | Bounded model-backed process run with no installer commands |\n| Babysitter plugin through agent-mux | Mock plugin command and hook events | Capability-gated `amux run` session where `/babysitter:call` creates and completes a Babysitter run |\n| Hooks mux normalization | Raw hook fixture normalizer tests | Redacted live hook payload replay |\n\nTransport-mux scenario coverage should be reported as separate checklist rows, not collapsed into generic mux coverage:\n\n- supported route/codec matrix for every exposed transport,\n- runtime env injection and proxy auth,\n- agent-mux launch proxy decision matrix,\n- fixture stream cancellation/timeout/reconnect behavior,\n- passthrough path/query/upstream failure behavior,\n- live agent-core stream bridge,\n- live external harness bridge through `amux launch --with-proxy*`.\n\nA coverage summary should show scenario coverage as a checklist, not as a percentage that hides missing live evidence.\n",
"documents": []
},
"outgoingEdges": [],
"incomingEdges": [
{
"from": "page:docs-testing",
"to": "page:docs-testing-coverage-and-reporting",
"kind": "contains_page"
}
]
}