Agentic AI Atlasby a5c.ai
OverviewWikiGraphFor AgentsEdgesSearchWorkspace
/
GitHubDocsDiscord
i.3Wiki
Agentic AI Atlas · Coverage And Reporting
docs/testing/coverage-and-reportinga5c.ai
Search the atlas/
Wiki · linked records

Article and nearby pages

I.Current articlepp. 1 - 1
Agent Mux And Runtime E2ECurrent Test Command InventoryHarness And Plugin E2EImplementation RoadmapMock And Fixture ContractsPipeline Integration
I.
Wiki article

docs/testing/coverage-and-reporting

Reading · 5 min

Coverage And Reporting reference

Coverage reporting should make the repository-wide test story visible without turning every test into one slow monolithic gate.

Page nodewiki/docs/testing/coverage-and-reporting.mdNearby pages · 11Documents · 0

Continue reading

Nearby pages in the same section.

Agent Mux And Runtime E2ECurrent Test Command InventoryHarness And Plugin E2EImplementation RoadmapMock And Fixture ContractsPipeline IntegrationPrimary Flow Data PathsQuality GatesStack PermutationsTest LanesTrace Identifiers And Evidence

Coverage And Reporting

Coverage reporting should make the repository-wide test story visible without turning every test into one slow monolithic gate.

Coverage Targets

LayerCoverage mechanismReporting expectation
Unit and contract testsVitest coverage per packagePackage coverage reports uploaded as artifacts and merged into a repo summary
Browser E2EPlaywright traces and screenshots on failureTrace artifact plus scenario summary, not line coverage as the primary signal
CLI and harness testsCommand transcript and JSON result assertionsCommand matrix with pass/fail, duration, and installed version metadata
Model-backed testsRedacted run logs and event assertionsProvider/harness matrix with credential gate status, model name, duration, and token/usage metadata when safe
Docs and generated assetsExisting docs QA and generator checksDocs QA artifact plus generated-output diff/compare result

Whole-Repo Coverage Report

The long-term target is one repository coverage artifact with package-level sections:

  • @a5c-ai/babysitter-sdk,
  • @a5c-ai/babysitter-agent,
  • @a5c-ai/agent-core,
  • @a5c-ai/transport-mux,
  • the @a5c-ai/agent-mux package family,
  • the hooks-mux package family,
  • @a5c-ai/agent-plugins-mux,
  • @a5c-ai/cloud,
  • docs/generator checks.

Vitest coverage should remain package-local during execution, then a dedicated reporting job can merge summaries. Playwright traces and model-backed artifacts should be linked from the same summary but should not be converted into misleading line coverage.

Minimum Evidence Per Lane

LaneMinimum evidence
No-modelCommand, package/workspace, test file count, assertion count when available, coverage summary when enabled
Model-backedCommand, harness/provider, installed versions, credential gate result, model name or backend, redacted final output, event assertions
Pipeline gateWorkflow run ID, job name, commit SHA, branch, artifact names, pass/fail or skip reason
Release gateAll pipeline evidence plus package versions and publish/deploy dependency that consumed the evidence

Reporting Rules

  • A green model-backed lane must prove that at least one real provider call or real harness call happened.
  • A skipped model-backed lane must be visibly skipped before setup, with a single missing dependency named.
  • A no-model lane must not depend on external provider availability.
  • A coverage summary must distinguish line coverage from E2E scenario coverage.
  • A release gate must link to the exact artifacts that support the publish decision.

Implementation Sequence

1. Normalize package-local Vitest coverage output. 2. Add a coverage/no-model artifact for deterministic tests. 3. Add Playwright trace artifacts for UI E2E failures. 4. Add model-backed matrix artifacts with redacted logs. 5. Add a merged markdown summary that CI attaches to workflow summaries and release candidates.

Threshold Policy

Initial thresholds should be conservative and package-local. Raising a threshold requires a passing baseline report from the package owner.

MetricInitial policyBlocks merge?
Package line coverageDo not decrease by more than 2 percentage points from baselineYes for no-model PR checks
Contract fixture coverageEvery committed fixture family has at least one parser/secret-scan testYes
Playwright scenario countScenario count may not drop unless a test is renamed or removed with docsYes for UI-owned PRs
Model-backed success rateThree consecutive scheduled successes before staging promotionNo for PRs; yes after staging promotion
Runtime durationWarn at 80 percent of budget, fail at hard timeoutYes for required lanes
Flake rateMore than two infra-classified failures in seven days triggers quarantineNo during quarantine; yes after promotion

Trend-only metrics include token usage, provider latency, UI trace size, and artifact size. They should be shown in summaries but should not block merges until a maintainer intentionally promotes a threshold.

Scenario coverage is separate from line coverage. It tracks whether critical user-visible flows have at least one no-model proof and, where needed, one live proof.

ScenarioNo-model proofLive proof
Codex SDK setupDry-run harness/plugin installer JSONCapability-gated live setup or documented skip; do not claim agent-mux plugin-manager support unless adapter capability allows it
Claude Code SDK setupDry-run harness/plugin installer JSONLive setup artifact plus installed plugin manifest where selected
Agent-mux adapter/session protocolFixture transcript through adapter testsLive Codex/Claude session event comparison via amux run or SDK createClient().run
Transport-mux route/runtime bridgeLocal route matrix, env injection, launch-plan proxy decisions, fixture stream, passthrough, metrics/cache, and cancellation testsLive agent-core stream through transport plus agent-mux-launched external harness proxy stream with redacted launch/env/metrics artifacts
Babysitter-agent runtime orchestrationMock planner/executor run journalBounded model-backed process run with no installer commands
Babysitter plugin through agent-muxMock plugin command and hook eventsCapability-gated amux run session where /babysitter:call creates and completes a Babysitter run
Hooks mux normalizationRaw hook fixture normalizer testsRedacted live hook payload replay

Transport-mux scenario coverage should be reported as separate checklist rows, not collapsed into generic mux coverage:

  • supported route/codec matrix for every exposed transport,
  • runtime env injection and proxy auth,
  • agent-mux launch proxy decision matrix,
  • fixture stream cancellation/timeout/reconnect behavior,
  • passthrough path/query/upstream failure behavior,
  • live agent-core stream bridge,
  • live external harness bridge through amux launch --with-proxy*.

A coverage summary should show scenario coverage as a checklist, not as a percentage that hides missing live evidence.

Trail

Wiki
Babysitter Docs
Testing Strategy

Coverage And Reporting

Continue reading

Agent Mux And Runtime E2E
Current Test Command Inventory
Harness And Plugin E2E
Implementation Roadmap
Mock And Fixture Contracts
Pipeline Integration
Primary Flow Data Paths
Quality Gates

Page record

Open node ledger

wiki/docs/testing/coverage-and-reporting.md

Documents

No documented graph nodes on this page.