Agent roles and model routing | Agentic Workflow

The five roles

Role	Stages it owns	Default model	Why	Closest analogue today
Architect	1 (read PRD), 2 (spec)	Reasoning-class (Opus / equivalent)	One expensive pass that produces a spec the rest of the pipeline can lean on. Expensive but rare.	Human + Cursor Composer / Claude Code (Opus) writing the PRD directly under `wiki/plans/`; no `wiki/specs/<feature>.md` is emitted yet.
Test-writer	3	Mid (Sonnet)	Writes failing tests from the spec. Mechanical once the spec is good — doesn't need Opus.	Editor agent (Composer / Claude Code) writes tests inline as part of the same iteration; no enforced "tests must fail first" gate.
Editor	4	Mid (Sonnet)	Iterates on the diff until tests pass. Many calls, each cheap. The cost driver.	Cursor Composer or Claude Code in a worktree (`./scripts/git/worktree-add.sh`); `MAX_ITERATIONS` is human-imposed.
Verifier	5–6 (self-review + dossier capture)	Mid (Sonnet)	Runs reviewer subagents, runs the test suite, captures the dossier. Needs reliability, not depth.	Manual `yarn type-check && yarn lint && yarn test:dashboard-stack` + the three review subagents; dossiers under `docs/dossiers/<feature>/` are hand-crafted.
Reviewer	7	Mid (Sonnet)	Polls Codex / CodeRabbit comments and re-enters the Editor. Stateless per pass.	Human reads Codex comments after each push (memory rule `feedback_codex_review_loop.md`) and pings the agent to re-enter.

The split is the Aider architect/editor pattern: one expensive reasoning pass, many cheap edit passes. Industry benchmarks (Aider polyglot, R1+Sonnet) put the architect/editor split at roughly 64 % polyglot SOTA at ~1/14th the cost of running everything on the reasoning model.

Cost rationale

Pipeline shape	Tokens spent	Quality	Verdict
Run everything on the reasoning model	Architect quality everywhere	Best	Too expensive at the cadence we want.
Run everything on the edit model	One pass; edit-model spec quality	Often misses architectural constraints	Cheap, frequent regressions.
Architect/editor split (this section's choice)	Reasoning once, edit N times	Architect-quality decisions, edit-quality output	Right for steady-state shipping.

The reviewer subagents (resolver-reviewer, dashboard-reviewer, perf-reviewer) currently run on the same edit model. As long as the spec carries architectural intent forward, edit-model review catches the drift the editor introduces — and reviewer cost is a tiny fraction of total tokens because reviewers don't write code.

Why model agnosticism is non-negotiable

See model-and-vendor-agnosticism.md. Short version: the table above lists Anthropic models because that's what we use today. The pipeline must keep working when "default" rotates to OpenAI, Gemini, or an OSS model — that's vision commitment (b). Implementation: each role talks to a model adapter, never to a vendor SDK directly.

Per-role responsibilities (concrete)

Architect

Reads wiki/plans/<feature>-YYYY-MM-DD.md.
Reads any wiki pages referenced in the PRD's frontmatter sources: field.
Emits wiki/specs/<feature>.md with Gherkin acceptance criteria.
Lists "expected files touched" so the Verifier can sanity-check scope creep.
Does not write tests, does not write code.

Test-writer

Reads spec.
Generates vitest tests for resolvers / services / hooks named in the spec.
Generates Playwright tests for end-to-end scenarios named in the spec.
Runs each test once and confirms it FAILS — informative error messages required.
Does not write production code.

Editor

Reads spec + failing tests.
Iterates: read failing test → implement minimum to make it pass → run all target tests → repeat.
Bounded by MAX_ITERATIONS (default 8). On exhaustion, surfaces the failure rather than grinding.
Honours .cursor/rules/appcaire-monorepo.mdc and the in-repo .claude/skills/ recipes.
Worktree-only for PRD-driven feature work. The pipeline may create the branch/worktree automatically. Never edits main directly for those flows. Never git push --force.

Verifier

Spawns the appropriate reviewer subagents per scope.
Runs yarn type-check, yarn lint, yarn test:dashboard-stack.
Runs Playwright with trace + screenshot capture.
Writes docs/dossiers/<feature>/{trace.zip, screenshot.png, console.log, summary.json}.
Refuses to mark the PR ready if any P1 review comment is open or the dossier is incomplete.

Reviewer (feedback loop)

After every git push, polls gh api repos/{org}/{repo}/pulls/<n>/comments.
Categorises Codex / CodeRabbit findings by severity.
For each P1 / P2: either re-enter Editor with the comment as input, or argue down with a justification committed to the PR description.
Loops until the comments page returns no new findings.

Reviewer subagent map

File touched	Subagents to run
`apps/dashboard-server/src/graphql/resolvers/**`	`resolver-reviewer` + `perf-reviewer` (if loops/lists)
`apps/dashboard/src/**`	`dashboard-reviewer`
Anything with a list resolver, field resolver, or `for (const x of prismaResults)`	`perf-reviewer`
Anything else	None — Verifier still runs type-check + lint + tests.

These subagents are documented in claude-agents-and-skills.md.

Cross-references

PRD-to-PR pipeline — what each stage produces.
Model and vendor agnosticism — vendor-rotation rule and adapter shape.
Throughput and business signals — features/sec/token, where it's logged, who consumes it.
Reviewer feedback loop — Codex polling specifics.
Engineering agents — current Darwin agents that map to these roles.