The five roles
| Role | Stages it owns | Default model | Why | Closest analogue today |
|---|---|---|---|---|
| Architect | 1 (read PRD), 2 (spec) | Reasoning-class (Opus / equivalent) | One expensive pass that produces a spec the rest of the pipeline can lean on. Expensive but rare. | Human + Cursor Composer / Claude Code (Opus) writing the PRD directly under wiki/plans/; no wiki/specs/<feature>.md is emitted yet. |
| Test-writer | 3 | Mid (Sonnet) | Writes failing tests from the spec. Mechanical once the spec is good — doesn't need Opus. | Editor agent (Composer / Claude Code) writes tests inline as part of the same iteration; no enforced "tests must fail first" gate. |
| Editor | 4 | Mid (Sonnet) | Iterates on the diff until tests pass. Many calls, each cheap. The cost driver. | Cursor Composer or Claude Code in a worktree (./scripts/git/worktree-add.sh); MAX_ITERATIONS is human-imposed. |
| Verifier | 5–6 (self-review + dossier capture) | Mid (Sonnet) | Runs reviewer subagents, runs the test suite, captures the dossier. Needs reliability, not depth. | Manual yarn type-check && yarn lint && yarn test:dashboard-stack + the three review subagents; dossiers under docs/dossiers/<feature>/ are hand-crafted. |
| Reviewer | 7 | Mid (Sonnet) | Polls Codex / CodeRabbit comments and re-enters the Editor. Stateless per pass. | Human reads Codex comments after each push (memory rule feedback_codex_review_loop.md) and pings the agent to re-enter. |
The split is the Aider architect/editor pattern: one expensive reasoning pass, many cheap edit passes. Industry benchmarks (Aider polyglot, R1+Sonnet) put the architect/editor split at roughly 64 % polyglot SOTA at ~1/14th the cost of running everything on the reasoning model.
Cost rationale
| Pipeline shape | Tokens spent | Quality | Verdict |
|---|---|---|---|
| Run everything on the reasoning model | Architect quality everywhere | Best | Too expensive at the cadence we want. |
| Run everything on the edit model | One pass; edit-model spec quality | Often misses architectural constraints | Cheap, frequent regressions. |
| Architect/editor split (this section's choice) | Reasoning once, edit N times | Architect-quality decisions, edit-quality output | Right for steady-state shipping. |
The reviewer subagents (resolver-reviewer, dashboard-reviewer, perf-reviewer) currently run on the same edit model. As long as the spec carries architectural intent forward, edit-model review catches the drift the editor introduces — and reviewer cost is a tiny fraction of total tokens because reviewers don't write code.
Why model agnosticism is non-negotiable
See model-and-vendor-agnosticism.md. Short version: the table above lists Anthropic models because that's what we use today. The pipeline must keep working when "default" rotates to OpenAI, Gemini, or an OSS model — that's vision commitment (b). Implementation: each role talks to a model adapter, never to a vendor SDK directly.
Per-role responsibilities (concrete)
Architect
- Reads
wiki/plans/<feature>-YYYY-MM-DD.md. - Reads any wiki pages referenced in the PRD's frontmatter
sources:field. - Emits
wiki/specs/<feature>.mdwith Gherkin acceptance criteria. - Lists "expected files touched" so the Verifier can sanity-check scope creep.
- Does not write tests, does not write code.
Test-writer
- Reads spec.
- Generates vitest tests for resolvers / services / hooks named in the spec.
- Generates Playwright tests for end-to-end scenarios named in the spec.
- Runs each test once and confirms it FAILS — informative error messages required.
- Does not write production code.
Editor
- Reads spec + failing tests.
- Iterates: read failing test → implement minimum to make it pass → run all target tests → repeat.
- Bounded by
MAX_ITERATIONS(default 8). On exhaustion, surfaces the failure rather than grinding. - Honours
.cursor/rules/appcaire-monorepo.mdcand the in-repo.claude/skills/recipes. - Worktree-only for PRD-driven feature work. The pipeline may create the branch/worktree automatically. Never edits
maindirectly for those flows. Nevergit push --force.
Verifier
- Spawns the appropriate reviewer subagents per scope.
- Runs
yarn type-check,yarn lint,yarn test:dashboard-stack. - Runs Playwright with trace + screenshot capture.
- Writes
docs/dossiers/<feature>/{trace.zip, screenshot.png, console.log, summary.json}. - Refuses to mark the PR ready if any P1 review comment is open or the dossier is incomplete.
Reviewer (feedback loop)
- After every
git push, pollsgh api repos/{org}/{repo}/pulls/<n>/comments. - Categorises Codex / CodeRabbit findings by severity.
- For each P1 / P2: either re-enter Editor with the comment as input, or argue down with a justification committed to the PR description.
- Loops until the comments page returns no new findings.
Reviewer subagent map
| File touched | Subagents to run |
|---|---|
apps/dashboard-server/src/graphql/resolvers/** |
resolver-reviewer + perf-reviewer (if loops/lists) |
apps/dashboard/src/** |
dashboard-reviewer |
Anything with a list resolver, field resolver, or for (const x of prismaResults) |
perf-reviewer |
| Anything else | None — Verifier still runs type-check + lint + tests. |
These subagents are documented in claude-agents-and-skills.md.
Cross-references
- PRD-to-PR pipeline — what each stage produces.
- Model and vendor agnosticism — vendor-rotation rule and adapter shape.
- Throughput and business signals — features/sec/token, where it's logged, who consumes it.
- Reviewer feedback loop — Codex polling specifics.
- Engineering agents — current Darwin agents that map to these roles.