The default shipping path: human writes a PRD, agents do the rest, human approves a screenshot. This page describes each stage's input, output, gate, and storage location.
Pipeline overview
The runner persists every transition into prd_to_pr_runs (SQLite at .compound-state/agent-service.db) so a process restart can read the row's current_stage and resume — except for spawned claude -p children, which are killed on restart and do not auto-resume (see "Outstanding" callout above; fix is detached: true in spawn options).
Stage 0 — Intake normalization
The orchestrator may receive a rough chat request, a Cursor plan under .cursor/plans/, or an already-written wiki PRD. Before spawning implementation agents it normalizes the input into a portable PRD:
- If the input is
.cursor/plans/<feature>.md, convert it intowiki/plans/<feature>-YYYY-MM-DD.md. - If the input is chat-only, draft the PRD in
wiki/plans/before any spec or code work. - Preserve the original source path in frontmatter
sources:. - Assign a stable feature slug used by the spec, worktree, branch, and dossier.
- Create an isolated worktree/branch automatically for feature execution; no separate human branch approval is required for PRD-driven workflows.
Stage 0 is allowed to ask a blocking question only when the requested product behavior is ambiguous enough to change the spec. It should not ask for permission to create the worktree.
Stage 1 — PRD
A wiki/plans/<feature>-YYYY-MM-DD.md file. Frontmatter must declare implementation_status: planned (or similar) and the body must open with a visible status callout per SCHEMA.md.
A good PRD answers:
- What's the user-facing change?
- What's the success metric (and the failure metric)?
- What's explicitly out of scope?
The PRD is the only place a human is required. Everything downstream consumes it.
Stage 2 — Spec
The Architect agent reads the PRD and emits wiki/specs/<feature>.md. The spec is machine-checkable Gherkin-style acceptance:
Feature: Lab promotion banner
Scenario: Best trial is shown above the leaderboard
Given a service area with at least one completed lab trial
And one trial is pinned in ServiceAreaBestLabScenario
When the user opens /admin/optimization-lab/<id>
Then the BestTrialBanner is visible
And it shows the pinned trial's metrics
Scenario: Promote button is disabled for in-progress trials
Given a trial whose solution.status is solving_active
Then the Promote button is disabled
And hovering shows "Trial still solving"
The spec is the coordination object between agents. It bounds what counts as done.
Stage 3 — Test stubs
The Test-writer agent reads the spec and writes failing tests:
- Vitest for unit + integration coverage of resolvers, services, hooks.
- Playwright for the acceptance scenarios named in the spec.
The pre-commit assertion: every test stub must run and fail with an informative error (typically "X is not implemented"). This catches accidentally tautological tests before the editor stage starts.
Stage 4 — Implementation
The Editor agent (cheaper Sonnet model) iterates on the diff until the test stubs pass. Constraints:
- Worktree per feature (
./scripts/git/worktree-add.sh <name> <branch>) — never editmaindirectly for PRD-driven work. The pipeline may create this worktree automatically. - No new files outside what the spec implies (no premature abstractions).
- No comments unless the why is non-obvious.
- React-hook-form + Zod +
Form*primitives for any form work; resolver layout per.cursor/rules/appcaire-monorepo.mdc.
The editor is allowed to fail. If after MAX_ITERATIONS the spec's tests aren't green, it surfaces the failure to the orchestrator instead of grinding forever.
Stage 5 — Self-review
The orchestrator runs the appropriate subagents in parallel:
resolver-reviewer— ifapps/dashboard-server/src/graphql/resolvers/was touched.dashboard-reviewer— ifapps/dashboard/src/was touched.perf-reviewer— if any list resolver, field resolver, or loop-over-Prisma was touched.
Findings are categorised P1 (block) / P2 (must address) / P3 (acknowledge). The Editor re-enters with the findings as an additional input.
Stage 6 — Verification
The Verifier agent runs:
yarn type-check,yarn lint,yarn test:dashboard-stack.- A
playwright testpass that captures the failure dossier (trace + screenshot + console + DOM snapshot + machine-readable summary). - The dossier is written to
docs/dossiers/<feature>/and committed alongside the diff.
This stage cannot be short-circuited. Without a dossier, the PR cannot merge. See verification-and-evidence.md.
Stage 7 — Reviewer loop
After git push, the Reviewer-feedback agent polls gh api repos/{org}/{repo}/pulls/<n>/comments until Codex / CodeRabbit have completed their review. Any unaddressed P1 or P2 comment re-enters the Editor stage. This codifies the standing memory rule on Codex review polling.
Stage 8 — Merge + evidence
When (a) CI is green, (b) the dossier is present, and (c) the reviewer loop is settled, the orchestrator queues the PR with gh pr merge --auto --squash against the merge group. After merge, the screenshot from the dossier is sent to the human channel (Telegram via interface-agent).
The human's job at this point is to look at the screenshot and either approve (silence = approval after a deadline) or reject. Rejection re-opens the spec for clarification.
Storage layout
<repo>/
├── wiki/plans/<feature>-YYYY-MM-DD.md ← PRD (human-authored)
├── wiki/specs/<feature>.md ← Architect output (Gherkin)
├── docs/dossiers/<feature>/ ← Verifier output
│ ├── trace.zip
│ ├── screenshot.png
│ ├── console.log
│ └── summary.json ← Machine-readable test outcomes
└── .compound-state/agent-service.db ← Throughput + cost metrics per stage
Everything except the dossier is committed to the feature branch. The dossier is committed for auditability and to make rebuild deterministic.
Cross-references
- Vision and mandate — the four commitments this pipeline implements.
- Agent roles and model routing — which model each stage uses.
- Spec as contract — why the spec is the coordination object.
- Verification and evidence — dossier format.
- Reviewer feedback loop — Codex polling.
- Throughput and business signals — features/sec/token measurement.
- Merge flow & PR velocity — the merge queue this pipeline targets.