This section is the wiki's instantiation of the vision and mandate: scale Caire output 1000× without scaling humans, by making agent execution the default path for shipping features. Humans write PRDs and approve evidence; agents do everything in between.
Human interfaces
Three tiers, each calling the same backend pipeline. L0 and L1 work today; L2 is the next outstanding item.
L0 — Cursor / Claude Code in a worktree
Use this when you want hands-on control of the diff before the runner takes over.
- Write
wiki/plans/<feature>-YYYY-MM-DD.mdwith PRD frontmatter and a status callout (see SCHEMA.md). - Create a worktree:
./scripts/git/worktree-add.sh <slug> {feat|fix|chore|docs}/<domain>/<slug>. - Open the PRD in Cursor Composer or invoke Claude Code in the worktree. The agent reads the PRD, writes tests, implements, runs the three review subagents, opens a PR.
- Verify before merge by reviewing the PR diff + the dossier folder under
docs/dossiers/<feature>/on GitHub. Approval = a normal GitHub PR review.
L1 — Darwin web at localhost:3010/prd-to-pr (live)
The hands-free path. The dashboard sidebar entry "PRD-to-PR" lists every run; the form at the top kicks one off.
- Write
wiki/plans/<feature>-YYYY-MM-DD.mdand commit tomainofbeta-appcaire. - On the dashboard, paste the PRD path (e.g.
wiki/plans/archive/banner-hello-2026-04-30.md) and click Start run. - The runner walks all 9 stages. The detail page shows live elapsed time, per-stage timing, the agent's stdout log per stage, and an artifact viewer (PRD body, architect's spec, test files, editor's diff, dossier).
- After stage 6 verifies the dossier, the run lands at
awaiting_approval. Click Approve to queuegh pr merge --auto --squash; click Reject to record a reason in the row'snotes.
The runner enforces an auto-kill switch: MAX_CONCURRENT_PRD_RUNS=10 and MAX_RUNS_PER_DAY=30 (override in ~/.config/caire/env or the process env). Submitting past the cap returns HTTP 429.
L2 — Telegram via interface-agent (outstanding)
Described in darwin-as-orchestrator.md. interface-agent routes PRD-style messages to the same POST /api/prd-to-pr endpoint that L1 uses; dossier screenshot posts back to Telegram; reply with "approve" merges. Thin shell over the L1 backend — the routing rule and Telegram handler are the only new code needed.
The nine stages (as built)
The runner's PIPELINE constant in apps/server/src/pipeline/runner.ts walks these in order. Stage names match the dashboard's current_stage column.
0. intake (worktree creation, env replication, yarn install + generate + per-server db:generate)
↓
1. prd (PRD frontmatter + status callout validated)
↓
2. spec (Architect agent — Opus — emits wiki/specs/<feature>.md Gherkin)
↓
3. tests (Test-writer agent — Sonnet — emits failing vitest + Playwright; runner WIP-commits them)
↓
4. implement (Editor agent — Sonnet — iterates with transactional git stash; MAX_ITERATIONS=8)
↓
5. review (resolver-reviewer / dashboard-reviewer / perf-reviewer subagents in parallel)
↓ (P1 finding → re-enter implement; cycle bumps; capped MAX_REVIEW_CYCLES=3)
6. verify (yarn type-check + lint + Playwright; dossier bundler writes docs/dossiers/<feature>/)
↓ (Playwright failure → re-enter implement with verifyHint; same cycle cap)
7. reviewer_loop (gh push; poll Codex/CodeRabbit comments; re-enter implement on P1/P2 or argue down)
↓
8. merge (gh pr merge --auto --squash; awaits human Approve from dashboard)
Re-entry edges (5→4 and 6→4) bump review_cycle. Hitting the cap fails the run with a surfaced reason; the dashboard's stage timing table records every execution so you see "Implement (2× · 4m 50s)" when iteration looped.
Pages in this section
- vision-and-mandate.md — the north star. The "CTO – AI Systems & Agent Workforce" job description in full. Every other page traces back to one of its four commitments.
- prd-to-pr-pipeline.md — what each stage produces, where artefacts live (
wiki/plans/<feature>-YYYY-MM-DD.md,wiki/specs/<feature>.md,docs/dossiers/<feature>/). - agent-roles-and-model-routing.md — Architect / Test-writer / Editor / Verifier / Reviewer roles, and which model each one runs.
- model-and-vendor-agnosticism.md — vision commitment (b). Routing matrix; rotation cadence; adapter shape.
- spec-as-contract.md — Thoughtworks SDD pattern: PRD compiles to spec; tests are generated from spec; spec is the coordination object.
- verification-and-evidence.md — failure-dossier pattern (Playwright Agents 1.56). The dossier IS the proof artefact attached to the PR.
- reviewer-feedback-loop.md —
gh api .../pulls/<n>/commentspolling; P1/P2 from Codex as failed tests; re-enter Editor. - scale-or-kill.md — vision commitment (c). Auto-promote what works; auto-kill what regresses. Hands-free.
- throughput-and-business-signals.md — vision commitment (d). Features per second per token; revenue/cost/cash as system inputs; mathematician chooses model routing inside the cash budget.
- darwin-as-orchestrator.md — the path to replacing the human CTO with
agent-CTOper the vision. Telegram → Darwin → 4-stage pipeline → PR with screenshot. - skills/humanizer.md — reusable skill for stripping AI tells from public-facing prose. Mandatory for marketing agents; useful anywhere user-visible copy is generated.
What this section is NOT
- A how-to for using the existing engineering team day-to-day. For that see agents-engineering.md.
- An advertisement of what we have today. Some of this is built; some is sketched. Each page calls out its
implementation_status. - A model-recommendation guide. Models change; the agnosticism page deliberately routes through adapters so model choice is a config decision, not an architectural one.
Cursor “Build plan” (Composer plan) — editor workflow
When the user opens a phased plan under .cursor/plans/*.plan.md and chooses Build plan (or asks the agent to implement it), treat that file as the source of truth for scope and sequencing, not as automatic approval to rewrite unrelated code.
- Read the plan and the current code — confirm which phase is in scope (stop at explicit phase boundaries unless the user expands scope).
- Spec by tests first when the plan calls for behavior — add or extend Vitest (dashboard-server / dashboard) so the new semantics are reproducible without clicking the UI.
- Implement minimally — follow monorepo resolver/UI rules; regenerate GraphQL types only when schema or
.graphqlfiles change. - Verify —
yarn type-check,yarn lint, and targetedvitestfor touched apps; browser check when the change is UI-visible. - Document — if user-visible behavior or workflow expectations change, update the relevant wiki page or
CLAUDE.mdnote in the same effort.
If the plan’s index or cross-links in wiki/ change materially, run yarn wiki:lint from the repo root.
Cross-references
- Vision and mandate — the canonical north-star reference.
- Engineering agents, optimization agents, management agents, marketing agents — the agents that will populate this pipeline.
- Compound workflow — the existing nightly pipeline that's the closest current approximation.
- Wiki as agent substrate (roadmap) — the work needed to make the wiki agent-friendly enough to drive this pipeline.
- Agentic workflow platform page (PRD) — the public-facing positioning page at
apps/dashboard/public/platform/en/agentic-workflow.htmlthat markets this section to investors, prospects, and technical evaluators. - PRD-to-PR pipeline runner (PRD) — the runner + dossier bundler that drives the eight stages end-to-end. Closes gap-list items #1–#5 from
darwin-component-map.md§c. - Hello-agentic banner (PRD) — dogfood end-to-end test for the runner: one tiny user-visible change, used to validate the eight-stage flow before any real PRD is fed to it.
- The public mirror of this section lives at
app.caire.se/platform/{en,sv}/agentic-workflow/(one shareable URL per guide). The mirror is regenerated from these files viayarn agentic-pages— seescripts/agentic-workflow-pages/build.ts.