Agentic AI Atlas

II.

Page JSON

page:docs-testing-harness-e2e

Structured · live

Harness And Plugin E2E json

Inspect the normalized record payload exactly as the atlas UI reads it.

File · wiki/docs/testing/harness-e2e.mdCluster · wiki

Record JSON

{
  "id": "page:docs-testing-harness-e2e",
  "_kind": "Page",
  "_file": "wiki/docs/testing/harness-e2e.md",
  "_cluster": "wiki",
  "attributes": {
    "nodeKind": "Page",
    "sourcePath": "docs/testing/harness-e2e.md",
    "sourceKind": "repo-docs",
    "title": "Harness And Plugin E2E",
    "displayName": "Harness And Plugin E2E",
    "slug": "docs/testing/harness-e2e",
    "articlePath": "wiki/docs/testing/harness-e2e.md",
    "article": "\n# Harness And Plugin E2E\n\nThis document covers harness setup and plugin-enabled sessions. It intentionally separates two different integration types:\n\n1. **SDK harness/plugin setup integration** uses `babysitter harness:install` and `babysitter harness:install-plugin`.\n2. **Agent-mux plugin/session E2E** starts an agent session through `agent-mux` and verifies plugin behavior inside that session.\n\n`babysitter-agent` runtime E2E is a third path and is covered in [Agent Mux And Runtime E2E](./agent-mux-and-runtime-e2e.md). It must not require `harness:install` or `harness:install-plugin` steps.\n\n## Path A: SDK Harness And Plugin Setup\n\nThis path tests the SDK install surfaces. It does not prove that `babysitter-agent` can run a process.\n\n```bash\nbabysitter harness:install codex --workspace . --json\nbabysitter harness:install claude-code --workspace . --json\nbabysitter harness:install-plugin codex --workspace . --json\nbabysitter harness:install-plugin claude-code --workspace . --json\nbabysitter plugin:install babysitter --project --json\nbabysitter list --json\n```\n\n| Test | Expected proof |\n| --- | --- |\n| `list` includes known harnesses | JSON includes harness names and capability metadata |\n| `harness:install --dry-run` for each target | Installation plan is valid and does not mutate the workspace |\n| `harness:install-plugin --dry-run` for each target | Plugin installer package, target, and destination are resolved |\n| Repeated plugin install | Manifest remains idempotent and contains no duplicate plugin entries |\n| Generic `plugin:install babysitter` | Project plugin registry entry is present when that path is selected |\n\nThe SDK installer path may delegate harness CLI install to agent-mux adapter install support internally, but the public test claim remains installer coverage.\n\n## Path B: Agent-Mux Plugin And Session E2E\n\nThis path tests a real or mocked agent session controlled by `agent-mux`. It should use `amux run <agent>` or `createClient().run({ agent })`, not `babysitter-agent call`.\n\n| Phase | Required action | Required assertions |\n| --- | --- | --- |\n| Capability gate | Read adapter capabilities for the target agent | Plugin-manager tests run only when `supportsPlugins` is true; otherwise the job records a skip/capability error |\n| Plugin precondition | Install or verify the Babysitter harness plugin with the correct native or SDK installer for that harness | Manifest or registry has the Babysitter plugin exactly once |\n| Start agent-mux session | Run `amux run <agent> --prompt <fixture>` or equivalent SDK call | Event stream has `session_start`, content/tool/hook events as applicable, and `session_end` |\n| Invoke Babysitter plugin command | Prompt issues `/babysitter:call` or the harness-equivalent Babysitter command inside the agent session | A Babysitter run ID is produced and can be inspected with SDK run commands |\n| Verify process lifecycle | Inspect run status/events after the session returns | Process was created, ran, posted at least one result, and reached `completed` |\n| Verify hook behavior | Inspect normalized hook logs or agent-mux runtime-hook events | Stop hook fired, continuation/stop decision was honored, and no plugin bypass path was used |\n\n### Adapter-Specific Rules\n\n| Target | Rule |\n| --- | --- |\n| Claude Code | Valid for agent-mux session, plugin-manager coverage where adapter supports it, and live stop-hook/plugin behavior |\n| Codex | Valid for agent-mux session coverage, but plugin-manager install must be capability-gated because the current Codex adapter reports `supportsPlugins: false` |\n| Gemini/Copilot/Cursor/OpenCode/OpenClaw/Oh-My-Pi | Include in setup smoke first; promote to plugin E2E only after adapter capability and plugin installer evidence exists |\n| Pi/agent-core | Not an agent-mux external-harness plugin path |\n\n## Path C: Babysitter-Agent Runtime E2E\n\nThis path validates `@a5c-ai/babysitter-agent` runtime behavior. It starts from preconditions, not installers.\n\nValid commands include:\n\n```bash\nbabysitter-agent call --harness agent-core --workspace . --prompt \"run the bounded runtime fixture\" --json\nbabysitter-agent call --harness claude-code --workspace . --prompt \"run the bounded runtime fixture\" --json\nbabysitter-agent invoke codex --workspace . --prompt \"return BABYSITTER_AGENT_BRIDGE_OK\" --json\n```\n\nRequired assertions:\n\n- no `harness:install` or `harness:install-plugin` command is executed as part of the babysitter-agent runtime test,\n- selected backend is recorded (`agent-core`, `pi`, or mapped external harness),\n- run is created when the command is `call/create-run`,\n- task effects are emitted and posted for process runs,\n- run reaches a terminal state,\n- agent-mux bridge events are present only when the selected external harness uses the bridge,\n- artifacts are redacted and include run ID, session ID, backend/harness name, model/provider metadata, and command transcript.\n\n## Failure Policy\n\n- Missing credentials should skip model-backed jobs before any provider call begins.\n- A selected setup job should fail if installer preconditions are unavailable.\n- A selected babysitter-agent runtime job should fail if it tries to run installer commands.\n- Use of the deprecated `harness:call` alias in new runtime tests should fail review; use `babysitter-agent call` for babysitter-agent runtime or `amux run` for agent-mux session E2E.\n- Any log containing a raw secret must fail the job and block artifact upload until redaction is fixed.\n\n## `install-plugins` Wrapper Acceptance Criteria\n\nIf the project adds an aggregate `install-plugins` command, test it only as setup-path coverage.\n\n| Criterion | Required assertion |\n| --- | --- |\n| Equivalence | Wrapper output lists the same plugin destinations as the explicit setup commands it wraps |\n| Idempotency | Running the wrapper twice does not duplicate plugin entries or corrupt manifests |\n| Scope clarity | Output states whether installation is project-local, user-global, or harness-local |\n| Failure isolation | Failure to install one harness plugin reports that harness without masking other completed installs |\n| JSON evidence | Wrapper emits machine-readable installed/skipped/failed entries for CI artifacts |\n\nDo not use the wrapper as a hidden prerequisite for babysitter-agent runtime E2E.\n",
    "documents": []
  },
  "outgoingEdges": [],
  "incomingEdges": [
    {
      "from": "page:docs-testing",
      "to": "page:docs-testing-harness-e2e",
      "kind": "contains_page"
    }
  ]
}

Harness And Plugin E2E json

Inspect the normalized record payload exactly as the atlas UI reads it.

File · wiki/docs/testing/harness-e2e.mdCluster · wiki

Record JSON

{
  "id": "page:docs-testing-harness-e2e",
  "_kind": "Page",
  "_file": "wiki/docs/testing/harness-e2e.md",
  "_cluster": "wiki",
  "attributes": {
    "nodeKind": "Page",
    "sourcePath": "docs/testing/harness-e2e.md",
    "sourceKind": "repo-docs",
    "title": "Harness And Plugin E2E",
    "displayName": "Harness And Plugin E2E",
    "slug": "docs/testing/harness-e2e",
    "articlePath": "wiki/docs/testing/harness-e2e.md",
    "article": "\n# Harness And Plugin E2E\n\nThis document covers harness setup and plugin-enabled sessions. It intentionally separates two different integration types:\n\n1. **SDK harness/plugin setup integration** uses `babysitter harness:install` and `babysitter harness:install-plugin`.\n2. **Agent-mux plugin/session E2E** starts an agent session through `agent-mux` and verifies plugin behavior inside that session.\n\n`babysitter-agent` runtime E2E is a third path and is covered in [Agent Mux And Runtime E2E](./agent-mux-and-runtime-e2e.md). It must not require `harness:install` or `harness:install-plugin` steps.\n\n## Path A: SDK Harness And Plugin Setup\n\nThis path tests the SDK install surfaces. It does not prove that `babysitter-agent` can run a process.\n\n```bash\nbabysitter harness:install codex --workspace . --json\nbabysitter harness:install claude-code --workspace . --json\nbabysitter harness:install-plugin codex --workspace . --json\nbabysitter harness:install-plugin claude-code --workspace . --json\nbabysitter plugin:install babysitter --project --json\nbabysitter list --json\n```\n\n| Test | Expected proof |\n| --- | --- |\n| `list` includes known harnesses | JSON includes harness names and capability metadata |\n| `harness:install --dry-run` for each target | Installation plan is valid and does not mutate the workspace |\n| `harness:install-plugin --dry-run` for each target | Plugin installer package, target, and destination are resolved |\n| Repeated plugin install | Manifest remains idempotent and contains no duplicate plugin entries |\n| Generic `plugin:install babysitter` | Project plugin registry entry is present when that path is selected |\n\nThe SDK installer path may delegate harness CLI install to agent-mux adapter install support internally, but the public test claim remains installer coverage.\n\n## Path B: Agent-Mux Plugin And Session E2E\n\nThis path tests a real or mocked agent session controlled by `agent-mux`. It should use `amux run <agent>` or `createClient().run({ agent })`, not `babysitter-agent call`.\n\n| Phase | Required action | Required assertions |\n| --- | --- | --- |\n| Capability gate | Read adapter capabilities for the target agent | Plugin-manager tests run only when `supportsPlugins` is true; otherwise the job records a skip/capability error |\n| Plugin precondition | Install or verify the Babysitter harness plugin with the correct native or SDK installer for that harness | Manifest or registry has the Babysitter plugin exactly once |\n| Start agent-mux session | Run `amux run <agent> --prompt <fixture>` or equivalent SDK call | Event stream has `session_start`, content/tool/hook events as applicable, and `session_end` |\n| Invoke Babysitter plugin command | Prompt issues `/babysitter:call` or the harness-equivalent Babysitter command inside the agent session | A Babysitter run ID is produced and can be inspected with SDK run commands |\n| Verify process lifecycle | Inspect run status/events after the session returns | Process was created, ran, posted at least one result, and reached `completed` |\n| Verify hook behavior | Inspect normalized hook logs or agent-mux runtime-hook events | Stop hook fired, continuation/stop decision was honored, and no plugin bypass path was used |\n\n### Adapter-Specific Rules\n\n| Target | Rule |\n| --- | --- |\n| Claude Code | Valid for agent-mux session, plugin-manager coverage where adapter supports it, and live stop-hook/plugin behavior |\n| Codex | Valid for agent-mux session coverage, but plugin-manager install must be capability-gated because the current Codex adapter reports `supportsPlugins: false` |\n| Gemini/Copilot/Cursor/OpenCode/OpenClaw/Oh-My-Pi | Include in setup smoke first; promote to plugin E2E only after adapter capability and plugin installer evidence exists |\n| Pi/agent-core | Not an agent-mux external-harness plugin path |\n\n## Path C: Babysitter-Agent Runtime E2E\n\nThis path validates `@a5c-ai/babysitter-agent` runtime behavior. It starts from preconditions, not installers.\n\nValid commands include:\n\n```bash\nbabysitter-agent call --harness agent-core --workspace . --prompt \"run the bounded runtime fixture\" --json\nbabysitter-agent call --harness claude-code --workspace . --prompt \"run the bounded runtime fixture\" --json\nbabysitter-agent invoke codex --workspace . --prompt \"return BABYSITTER_AGENT_BRIDGE_OK\" --json\n```\n\nRequired assertions:\n\n- no `harness:install` or `harness:install-plugin` command is executed as part of the babysitter-agent runtime test,\n- selected backend is recorded (`agent-core`, `pi`, or mapped external harness),\n- run is created when the command is `call/create-run`,\n- task effects are emitted and posted for process runs,\n- run reaches a terminal state,\n- agent-mux bridge events are present only when the selected external harness uses the bridge,\n- artifacts are redacted and include run ID, session ID, backend/harness name, model/provider metadata, and command transcript.\n\n## Failure Policy\n\n- Missing credentials should skip model-backed jobs before any provider call begins.\n- A selected setup job should fail if installer preconditions are unavailable.\n- A selected babysitter-agent runtime job should fail if it tries to run installer commands.\n- Use of the deprecated `harness:call` alias in new runtime tests should fail review; use `babysitter-agent call` for babysitter-agent runtime or `amux run` for agent-mux session E2E.\n- Any log containing a raw secret must fail the job and block artifact upload until redaction is fixed.\n\n## `install-plugins` Wrapper Acceptance Criteria\n\nIf the project adds an aggregate `install-plugins` command, test it only as setup-path coverage.\n\n| Criterion | Required assertion |\n| --- | --- |\n| Equivalence | Wrapper output lists the same plugin destinations as the explicit setup commands it wraps |\n| Idempotency | Running the wrapper twice does not duplicate plugin entries or corrupt manifests |\n| Scope clarity | Output states whether installation is project-local, user-global, or harness-local |\n| Failure isolation | Failure to install one harness plugin reports that harness without masking other completed installs |\n| JSON evidence | Wrapper emits machine-readable installed/skipped/failed entries for CI artifacts |\n\nDo not use the wrapper as a hidden prerequisite for babysitter-agent runtime E2E.\n",
    "documents": []
  },
  "outgoingEdges": [],
  "incomingEdges": [
    {
      "from": "page:docs-testing",
      "to": "page:docs-testing-harness-e2e",
      "kind": "contains_page"
    }
  ]
}