This page is the ground truth for the PRD-to-PR pipeline: which Darwin components are wired in today, which are useful but parked in a separate vertical, and which are outstanding. The prioritized "outstanding" list in §(c) is the build queue.
The audit pass that produced this map ran 2026-04-30 across be-agent-service/agents/prompts/, be-agent-service/apps/server/, beta-appcaire/.claude/agents/, beta-appcaire/scripts/, beta-appcaire/.github/workflows/, and beta-appcaire/docs/dossiers/. Refreshed 2026-04-30 evening after the runner end-to-end validation.
(a) Used by PRD-to-PR pipeline (engineering vertical)
These components are invoked by the pipeline today. Components marked (2026-04-30) moved here from §(c) after shipping.
| Component | Location | Pipeline use |
|---|---|---|
POST /api/prd-to-pr + 9-stage runner (2026-04-30) |
be-agent-service/apps/server/{routes/prd-to-pr.ts,pipeline/runner.ts} |
The state machine that walks every stage. Persists transitions to prd_to_pr_runs in agent-service.db. Honors MAX_REVIEW_CYCLES=3, MAX_CONCURRENT_PRD_RUNS=10, MAX_RUNS_PER_DAY=30. Re-entry edges 5→4 and 6→4. |
| Stage modules (2026-04-30) | be-agent-service/apps/server/src/pipeline/stages/{intake,prd,spec,tests,implement,review,verify,reviewerLoop,merge}.ts |
One per stage. intake.ts does worktree creation + env replication + yarn install + yarn generate + per-server db:generate. tests.ts WIP-commits the architect's spec and the test-writer's tests so the editor's git stash can't destroy them. implement.ts runs the iteration loop with transactional stash. verify.ts runs the dossier bundler then re-enters implement on Playwright failure with a verifyHint. |
| Dossier bundler (2026-04-30) | be-agent-service/apps/server/src/pipeline/dossier.ts |
Captures Playwright trace + final-state screenshot, filters console output, emits summary.json + a human-readable README.md, writes everything to docs/dossiers/<feature>/ and stages a commit. Stage 6 fails if any required artifact is missing. |
| Named agent prompts (2026-04-30) | be-agent-service/agents/prompts/engineering/{architect,test-writer,editor,verifier,reviewer-feedback}.md |
Spawned via claude -p in their respective stages. architect.md is route-locked (must copy /, /home, etc. verbatim from PRD). editor.md carries the iteration contract. |
| L1 dashboard pages (2026-04-30) | be-agent-service/apps/dashboard/src/pages/PRDToPR{Page,DetailPage}.tsx |
Run list, detail view with live-ticking elapsed clock, per-stage timing table fed from summary_json.stageHistory[], artifact viewer (PRD body / spec / tests / diff / dossier), Approve / Reject buttons. Stage log endpoint globs <stage>.*.log so every agent invocation surfaces. |
| Codex / CodeRabbit polling (2026-04-30) | be-agent-service/apps/server/src/pipeline/stages/reviewerLoop.ts |
Runs gh push then polls gh api repos/{org}/{repo}/pulls/<n>/comments. Parses P1/P2/P3 severity. Re-enters implement on unresolved P1/P2; argues down via PR description text where applicable. |
| Auto-kill switch + Max Plan OAuth (2026-04-30) | be-agent-service/apps/server/src/pipeline/env.ts |
enforceAutoKillSwitch() blocks new runs past concurrency / daily caps; HTTP 429 surfaced. isMaxPlanPreferred() reads CLAUDE_USE_MAX_PLAN=1 from ~/.config/caire/env and strips ANTHROPIC_API_KEY from spawned subprocess env so claude -p falls through to the macOS Keychain OAuth credentials. |
Three review subagents — resolver-reviewer, dashboard-reviewer, perf-reviewer |
beta-appcaire/.claude/agents/*.md (tracked via a .gitignore negation rule that re-includes .claude/agents/**) |
Stage 5. Wired and invoked in parallel before stage 6 verify. Routing decided by the architect's spec metadata. |
engineering/orchestrator agent prompt |
be-agent-service/agents/prompts/engineering/orchestrator.md |
Coordinates the compound nightly workflow. Not invoked by the PRD-to-PR runner — the runner.ts state machine plays that role. |
engineering/backend-specialist, frontend-specialist, db-architect-specialist, infrastructure-specialist, ux-designer-specialist |
be-agent-service/agents/prompts/engineering/*.md |
Available to Cursor Composer / Claude Code in a worktree (the L0 path). Not invoked by the PRD-to-PR runner — that path uses the named editor.md agent. |
engineering/senior-code-reviewer, verification-specialist |
be-agent-service/agents/prompts/engineering/*.md |
Same: L0 path. |
management/interface-agent |
be-agent-service/agents/prompts/management/interface-agent.md |
Target-state stage 0 / stage 8 Telegram bridge. Already brokers Telegram traffic for ad-hoc requests; needs a routing rule to recognise PRD-style messages — see §(c).1 below. |
management/cpo-cto |
be-agent-service/agents/prompts/management/cpo-cto.md |
Target-state cross-team router and quarterly model-routing ratifier (per model-and-vendor-agnosticism.md). |
| Darwin runtime | be-agent-service/apps/server/src/index.ts:42 (port 3010) |
HTTP host for /api/prd-to-pr/*. Also serves /api/repos, /api/agents, /api/workspace, /api/jobs, /api/schedules, /api/metrics. |
| SQLite state | .compound-state/agent-service.db |
Persists prd_to_pr_runs rows + per-stage stageHistory[] in summary_json. Throughput log per throughput-and-business-signals.md. |
| Launchd job slots | be-agent-service/scripts/manage-darwin-dashboard-launchd.sh |
Hosts the unified server in production. Same job slot also runs schedule research and compound nightly. |
| Compound nightly workflow | be-agent-service/scripts/compound/{auto-compound,daily-compound-review,loop,analyze-report}.sh |
Separate vertical (priorities-driven nightly review). Pre-existing; not replaced by the PRD-to-PR runner. |
| Worktree scripts | beta-appcaire/scripts/git/worktree-{add,remove}.sh |
Stage 0 worktree creation. Mandatory for PRD-driven work per the standing memory rule. |
| GitHub Merge Queue | beta-appcaire/.github/workflows/pr-checks.yml (merge_group: triggers, dashboard-server tests (required for merge) aggregator) |
Stage 8 gate. Live; the pipeline targets it via gh pr merge --auto --squash. |
docs/dossiers/<feature>/ |
beta-appcaire/docs/dossiers/* |
Stage 6 storage. Now populated automatically by the dossier bundler. Pre-existing hand-crafted dossiers (e.g. visit-chain-scope) remain as historical reference. |
wiki/plans/<feature>-YYYY-MM-DD.md PRD convention |
beta-appcaire/wiki/plans/* |
Stage 1 input. Convention is in active use; humans write PRDs here. |
(b) Parked — separate verticals, not used by PRD-to-PR
These components are part of Darwin but solve different problems. They should not be merged into the PRD-to-PR pipeline; mixing verticals produces ambiguous orchestration. Listed here so the reader can rule them out without re-doing the audit.
| Component | Location | Why it's not in the PRD-to-PR pipeline |
|---|---|---|
Marketing team — jarvis-orchestrator, shuri-product-analyst, fury-customer-researcher, vision-seo-analyst, loki-content-writer, quill-social-media, pepper-email-marketing, wanda-designer, friday-developer, wong-notion-agent |
be-agent-service/agents/prompts/marketing/*.md |
Marketing/content vertical with its own pipeline (loki-content-writer.sh etc.) and its own humanizer skill. Different intake, different output. |
Optimization team — optimization-mathematician, timefold-specialist |
be-agent-service/agents/prompts/optimization/*.md |
Schedule research vertical. Reads benchmark history, proposes constraint-weight changes. Consumes the throughput log produced by the PRD-to-PR pipeline (per throughput-and-business-signals.md) but is not part of it. |
management/ceo, management/hr-agent-lead |
be-agent-service/agents/prompts/management/*.md |
Strategic / org concerns. Useful for budget allocation per vision commitment (d), but not invoked per-PR. |
| Hannes Dashboard | be-agent-service/scripts/manage-hannes-dashboard-launchd.sh |
Separate launchd job for a separate stakeholder view. Distinct concern from the engineering Dashboard at localhost:3010. |
| Schedule research loop | be-agent-service/apps/server/src/routes/schedule-runs/* |
Optimization vertical's runtime. Reuses Darwin's launchd slots and SQLite, but is not feature delivery. |
(c) Outstanding — the prioritized build queue
Items 1–5 from the original queue (pipeline runner, dossier bundler, Codex polling, named agent prompts, L1 dashboard UI) shipped between commits 92bc142 and 0270e90 and have moved to §(a). What remains:
1. Telegram → POST /api/prd-to-pr routing — ~1 day
- File:
be-agent-service/agents/prompts/management/interface-agent.mdplus a small router underbe-agent-service/apps/server/src/routes/telegram/. - What it does: detect PRD-style Telegram messages (URL to a
wiki/plans/*.mdfile or an attached PRD), POST to/api/prd-to-pr, post the dossierREADME.md+screenshot.pngback to the originating Telegram thread, accept "approve" / "reject" replies. This is the L2 human interface. - Why first: thin shell over the now-shipped L1 backend. The hardest part is the message-classification rule; the round-trip plumbing is one HTTP POST and one Telegram callback.
2. Runner orphan resilience — ~half a day
- Files: every stage module that spawns
claude -p—pipeline/stages/{spec,tests,implement,review,reviewerLoop}.ts. - What it does: add
detached: true, stdio: ['ignore', 'pipe', 'pipe']to thespawn()options so spawnedclaude -pchildren survive atsx watchreload of the unified server. Also persist child PIDs in the row and add a "resume from last completed stage" code path inrunner.ts. - Why second: today, editing any server file mid-run kills the in-flight pipeline because
tsx watchSIGTERM's the parent and propagates to children. Caused 3 lost runs during the dogfood validation. Easy fix; high return.
3. Per-iteration live telemetry inside stage 4 — ~half a day
- File:
be-agent-service/apps/server/src/pipeline/stages/implement.ts. - What it does: introduce a
ctx.persist()callback (or importupdatePrdRunStagedirectly with a small refactor to break the circular import) so the editor's iteration count, regression hint, and stash state land in the row mid-stage instead of only at stage exit. Today the dashboard's "Stage running: 4m 12s" indicator updates live viastage_started_at, butiteration_countonly updates after the wholeimplementstage returns. - Why third: observability gap, not a correctness one. Run #12 stage 4 ran for 4m 50s with
iter=0displayed the entire time — the dashboard's "stuck?" badge can't fire correctly without this.
4. Token / cost accounting
- What it does: parse
--output-format=jsonfromclaude -pinvocations, suminput_tokens+output_tokensper stage, persist intotokens_total+per_role_breakdownon the row. - Why deferred: Max Plan OAuth (the current default) is flat-rate billing — per-call cost is meaningless and the dashboard correctly shows
—. Implementing token accounting is mostly useful when API-key billing is reactivated for a specific run or model. Add a CLI flag--meterthat opts a single run into JSON-output parsing rather than instrumenting unconditionally.
5. (Defer) Model adapter + scale-or-kill / GrowthBook ramp
- The model adapter from model-and-vendor-agnosticism.md is a non-trivial migration of every existing Darwin agent. Don't block items 1–4 on it; introduce the adapter in parallel and migrate the cheapest / highest-volume role first (probably
cheapvia Ollama). - The scale-or-kill agent from scale-or-kill.md is gated behind throughput data. The pipeline now produces real
summary_json.stageHistorydata — once a few real PRDs (not justbanner-hello) have shipped, the agent has something to read.
How to use this map
- Building the next missing piece? Pick the lowest-numbered item in §(c) that isn't yet in flight, scope it, open a PRD under
wiki/plans/, and run it through the L0 path. - Adding a new agent to Darwin? Decide whether it's part of the engineering vertical (§a), a separate vertical (§b), or a missing piece of the PRD-to-PR pipeline (§c). Update this map in the same PR. The map is the audit; if the audit drifts, the wiki lies.
- Reading this for the first time? Skim §(a) to see what's real, skim §(c) to see what's next. §(b) exists so you can stop wondering whether
loki-content-writeris somehow part of the pipeline (it isn't).
Cross-references
- README — orientation for the agentic-workflow section, including the L0 / L1 / L2 human interface tiers.
- PRD-to-PR pipeline — the eight-stage pipeline this map provisions for.
- Darwin as orchestrator — orchestrator-runtime decision and end-to-end flow.
- Agent roles and model routing — five named roles and their closest analogues today.
- Verification and evidence — dossier format the bundler in §(c) must produce.
- Reviewer feedback loop — Codex polling specifics for the loop in §(c).
- Compound workflow — the precursor pipeline already running nightly.