Agentic AI Atlasby a5c.ai
OverviewWikiGraphFor AgentsEdgesSearchWorkspace
/
GitHubDocsDiscord
i.3Wiki
Agentic AI Atlas · Harness And Plugin E2E
docs/testing/harness-e2ea5c.ai
Search the atlas/
Wiki · linked records

Article and nearby pages

I.Current articlepp. 1 - 1
Agent Mux And Runtime E2ECoverage And ReportingCurrent Test Command InventoryImplementation RoadmapMock And Fixture ContractsPipeline Integration
I.
Wiki article

docs/testing/harness-e2e

Reading · 5 min

Harness And Plugin E2E reference

This document covers harness setup and plugin-enabled sessions. It intentionally separates two different integration types:

Page nodewiki/docs/testing/harness-e2e.mdNearby pages · 11Documents · 0

Continue reading

Nearby pages in the same section.

Agent Mux And Runtime E2ECoverage And ReportingCurrent Test Command InventoryImplementation RoadmapMock And Fixture ContractsPipeline IntegrationPrimary Flow Data PathsQuality GatesStack PermutationsTest LanesTrace Identifiers And Evidence

Harness And Plugin E2E

This document covers harness setup and plugin-enabled sessions. It intentionally separates two different integration types:

1. **SDK harness/plugin setup integration** uses babysitter harness:install and babysitter harness:install-plugin. 2. **Agent-mux plugin/session E2E** starts an agent session through agent-mux and verifies plugin behavior inside that session.

babysitter-agent runtime E2E is a third path and is covered in Agent Mux And Runtime E2E. It must not require harness:install or harness:install-plugin steps.

Path A: SDK Harness And Plugin Setup

This path tests the SDK install surfaces. It does not prove that babysitter-agent can run a process.

bash
babysitter harness:install codex --workspace . --json
babysitter harness:install claude-code --workspace . --json
babysitter harness:install-plugin codex --workspace . --json
babysitter harness:install-plugin claude-code --workspace . --json
babysitter plugin:install babysitter --project --json
babysitter list --json
TestExpected proof
list includes known harnessesJSON includes harness names and capability metadata
harness:install --dry-run for each targetInstallation plan is valid and does not mutate the workspace
harness:install-plugin --dry-run for each targetPlugin installer package, target, and destination are resolved
Repeated plugin installManifest remains idempotent and contains no duplicate plugin entries
Generic plugin:install babysitterProject plugin registry entry is present when that path is selected

The SDK installer path may delegate harness CLI install to agent-mux adapter install support internally, but the public test claim remains installer coverage.

Path B: Agent-Mux Plugin And Session E2E

This path tests a real or mocked agent session controlled by agent-mux. It should use amux run <agent> or createClient().run({ agent }), not babysitter-agent call.

PhaseRequired actionRequired assertions
Capability gateRead adapter capabilities for the target agentPlugin-manager tests run only when supportsPlugins is true; otherwise the job records a skip/capability error
Plugin preconditionInstall or verify the Babysitter harness plugin with the correct native or SDK installer for that harnessManifest or registry has the Babysitter plugin exactly once
Start agent-mux sessionRun amux run <agent> --prompt <fixture> or equivalent SDK callEvent stream has session_start, content/tool/hook events as applicable, and session_end
Invoke Babysitter plugin commandPrompt issues /babysitter:call or the harness-equivalent Babysitter command inside the agent sessionA Babysitter run ID is produced and can be inspected with SDK run commands
Verify process lifecycleInspect run status/events after the session returnsProcess was created, ran, posted at least one result, and reached completed
Verify hook behaviorInspect normalized hook logs or agent-mux runtime-hook eventsStop hook fired, continuation/stop decision was honored, and no plugin bypass path was used

Adapter-Specific Rules

TargetRule
Claude CodeValid for agent-mux session, plugin-manager coverage where adapter supports it, and live stop-hook/plugin behavior
CodexValid for agent-mux session coverage, but plugin-manager install must be capability-gated because the current Codex adapter reports supportsPlugins: false
Gemini/Copilot/Cursor/OpenCode/OpenClaw/Oh-My-PiInclude in setup smoke first; promote to plugin E2E only after adapter capability and plugin installer evidence exists
Pi/agent-coreNot an agent-mux external-harness plugin path

Path C: Babysitter-Agent Runtime E2E

This path validates @a5c-ai/babysitter-agent runtime behavior. It starts from preconditions, not installers.

Valid commands include:

bash
babysitter-agent call --harness agent-core --workspace . --prompt "run the bounded runtime fixture" --json
babysitter-agent call --harness claude-code --workspace . --prompt "run the bounded runtime fixture" --json
babysitter-agent invoke codex --workspace . --prompt "return BABYSITTER_AGENT_BRIDGE_OK" --json

Required assertions:

  • no harness:install or harness:install-plugin command is executed as part of the babysitter-agent runtime test,
  • selected backend is recorded (agent-core, pi, or mapped external harness),
  • run is created when the command is call/create-run,
  • task effects are emitted and posted for process runs,
  • run reaches a terminal state,
  • agent-mux bridge events are present only when the selected external harness uses the bridge,
  • artifacts are redacted and include run ID, session ID, backend/harness name, model/provider metadata, and command transcript.

Failure Policy

  • Missing credentials should skip model-backed jobs before any provider call begins.
  • A selected setup job should fail if installer preconditions are unavailable.
  • A selected babysitter-agent runtime job should fail if it tries to run installer commands.
  • Use of the deprecated harness:call alias in new runtime tests should fail review; use babysitter-agent call for babysitter-agent runtime or amux run for agent-mux session E2E.
  • Any log containing a raw secret must fail the job and block artifact upload until redaction is fixed.

`install-plugins` Wrapper Acceptance Criteria

If the project adds an aggregate install-plugins command, test it only as setup-path coverage.

CriterionRequired assertion
EquivalenceWrapper output lists the same plugin destinations as the explicit setup commands it wraps
IdempotencyRunning the wrapper twice does not duplicate plugin entries or corrupt manifests
Scope clarityOutput states whether installation is project-local, user-global, or harness-local
Failure isolationFailure to install one harness plugin reports that harness without masking other completed installs
JSON evidenceWrapper emits machine-readable installed/skipped/failed entries for CI artifacts

Do not use the wrapper as a hidden prerequisite for babysitter-agent runtime E2E.

Trail

Wiki
Babysitter Docs
Testing Strategy

Harness And Plugin E2E

Continue reading

Agent Mux And Runtime E2E
Coverage And Reporting
Current Test Command Inventory
Implementation Roadmap
Mock And Fixture Contracts
Pipeline Integration
Primary Flow Data Paths
Quality Gates

Page record

Open node ledger

wiki/docs/testing/harness-e2e.md

Documents

No documented graph nodes on this page.