Agentic Workflow

Agent roles and model routing

Architect, Test-writer, Editor, Verifier, Reviewer — and which model each one runs.

The five roles

Role Stages it owns Default model Why Closest analogue today
Architect 1 (read PRD), 2 (spec) Reasoning-class (Opus / equivalent) One expensive pass that produces a spec the rest of the pipeline can lean on. Expensive but rare. Human + Cursor Composer / Claude Code (Opus) writing the PRD directly under wiki/plans/; no wiki/specs/<feature>.md is emitted yet.
Test-writer 3 Mid (Sonnet) Writes failing tests from the spec. Mechanical once the spec is good — doesn't need Opus. Editor agent (Composer / Claude Code) writes tests inline as part of the same iteration; no enforced "tests must fail first" gate.
Editor 4 Mid (Sonnet) Iterates on the diff until tests pass. Many calls, each cheap. The cost driver. Cursor Composer or Claude Code in a worktree (./scripts/git/worktree-add.sh); MAX_ITERATIONS is human-imposed.
Verifier 5–6 (self-review + dossier capture) Mid (Sonnet) Runs reviewer subagents, runs the test suite, captures the dossier. Needs reliability, not depth. Manual yarn type-check && yarn lint && yarn test:dashboard-stack + the three review subagents; dossiers under docs/dossiers/<feature>/ are hand-crafted.
Reviewer 7 Mid (Sonnet) Polls Codex / CodeRabbit comments and re-enters the Editor. Stateless per pass. Human reads Codex comments after each push (memory rule feedback_codex_review_loop.md) and pings the agent to re-enter.

The split is the Aider architect/editor pattern: one expensive reasoning pass, many cheap edit passes. Industry benchmarks (Aider polyglot, R1+Sonnet) put the architect/editor split at roughly 64 % polyglot SOTA at ~1/14th the cost of running everything on the reasoning model.

Cost rationale

Pipeline shape Tokens spent Quality Verdict
Run everything on the reasoning model Architect quality everywhere Best Too expensive at the cadence we want.
Run everything on the edit model One pass; edit-model spec quality Often misses architectural constraints Cheap, frequent regressions.
Architect/editor split (this section's choice) Reasoning once, edit N times Architect-quality decisions, edit-quality output Right for steady-state shipping.

The reviewer subagents (resolver-reviewer, dashboard-reviewer, perf-reviewer) currently run on the same edit model. As long as the spec carries architectural intent forward, edit-model review catches the drift the editor introduces — and reviewer cost is a tiny fraction of total tokens because reviewers don't write code.

Why model agnosticism is non-negotiable

See model-and-vendor-agnosticism.md. Short version: the table above lists Anthropic models because that's what we use today. The pipeline must keep working when "default" rotates to OpenAI, Gemini, or an OSS model — that's vision commitment (b). Implementation: each role talks to a model adapter, never to a vendor SDK directly.

Per-role responsibilities (concrete)

Architect

Test-writer

Editor

Verifier

Reviewer (feedback loop)

Reviewer subagent map

File touched Subagents to run
apps/dashboard-server/src/graphql/resolvers/** resolver-reviewer + perf-reviewer (if loops/lists)
apps/dashboard/src/** dashboard-reviewer
Anything with a list resolver, field resolver, or for (const x of prismaResults) perf-reviewer
Anything else None — Verifier still runs type-check + lint + tests.

These subagents are documented in claude-agents-and-skills.md.

Cross-references