The vision says humans define what; agents execute how. Darwin is the implementation: a single addressable entity (Telegram, eventually web UI) that takes a PRD and returns a merged PR with evidence. The human's only required action is reading the screenshot.
This is the path to replacing the human CTO with the agent-CTO described in vision-and-mandate.md.
End-to-end flow
1. Human posts PRD link to Telegram
↓
2. interface-agent receives, validates frontmatter, posts ack
↓
3. interface-agent → orchestrator (engineering team)
↓
4. orchestrator runs the 7-stage pipeline (PRD → PR with dossier)
↓
5. After merge, Darwin posts the dossier screenshot back to Telegram
↓
6. Human looks at the screenshot, replies "approve" or "reject"
↓
7a. approve → done. Scale-or-kill agent takes over for ramp.
7b. reject → re-open spec for clarification, return to stage 2.
What Darwin is, today
Per darwin-dashboard.md:
- Unified API + UI at
localhost:3010(be-agent-service/apps/server/src/index.ts:42). - Endpoints:
/api/repos/<repo>/{status,priorities,logs}. - Backed by the
compound-state/agent-service.dbsqlite. - Hosts the schedule research loop, the marketing team, and the engineering compound runs.
What's missing for Darwin-as-orchestrator:
/api/prd-to-pr/<feature>endpoint that accepts a PRD path and starts the pipeline.- Telegram → orchestrator routing for PRD-style messages (today messages route to general-purpose subagents, not the pipeline).
- The pipeline itself (it's documented across this section but not yet a single executable pass).
Decision: orchestrator runtime is a thin in-house Node loop
Recorded 2026-04-30 to stop the next builder from relitigating this. The pipeline runner lives inside be-agent-service as a thin (~200 LOC) Node state machine, not as a LangGraph (Python) sidecar or any external workflow engine.
Reasoning:
- Matches the existing
be-agent-service/apps/server/src/routes/*style — no new language, no new process boundary. - Persists state in
.compound-state/agent-service.db, which already exists and is already read by the throughput logger. - Reuses the launchd job slots that already host the schedule research loop and compound nightly workflow.
- Each pipeline stage is a function that calls the model adapter (see model-and-vendor-agnosticism.md) or shells out (
gh,playwright,yarn). No DAG library needed for eight linear stages with one re-entry edge (stage 7 → stage 4). - Revisit only if the state machine outgrows ~500 LOC or the re-entry topology becomes non-trivial.
Why a single orchestrator
If we build N agents that all do "PRD-to-PR" with slight variations, we get N versions of the spec format, N reviewer-feedback loops, N dossier formats. The vision rule "humans define what" implies a single, well-defined interface — many orchestrators violate that.
Darwin is the chosen orchestrator because:
- It already owns the agent-service runtime.
- It already has the launchd job slots for long-lived loops.
- It already brokers the Telegram interface via
interface-agent. - It already has the sqlite state DB the throughput logger writes to.
What Darwin needs from this section
The pages in this section define the contracts Darwin's orchestrator must implement:
| Concern | Defined in |
|---|---|
| What stages exist, what they emit | prd-to-pr-pipeline.md |
| Which agent runs which stage | agent-roles-and-model-routing.md |
| How models are chosen | model-and-vendor-agnosticism.md |
| What "spec" means | spec-as-contract.md |
| What "evidence" means | verification-and-evidence.md |
| When to re-enter the editor | reviewer-feedback-loop.md |
| How to ramp / roll-back after merge | scale-or-kill.md |
| What budget constraints apply | throughput-and-business-signals.md |
Replacing the human CTO
Vision quote:
The long-term bottleneck of Caire should be compute, not headcount.
The CTO replacement happens when:
- PRDs go straight to Darwin without going through a human engineering manager.
- Routing decisions across the four agent teams (engineering, optimization, marketing, management) come from
cpo-cto(the agent), not the human. - Scale-or-kill decisions happen daily without a human deciding "let's ramp this" or "let's kill that".
- Budget allocation happens at management-agent cadence (CEO + CPO/CTO) with the human only ratifying at quarterly checkpoints.
Each item is a concrete, testable transition. We're not at any of them yet. The order they should land:
- PRD → Darwin → PR (this section's main deliverable). Easiest. Captures the highest-volume work.
- Scale-or-kill automation. Next. Removes the daily ramp/roll-back overhead.
- Budget allocation by management agents. Third. Requires finance integration + trust calibration.
- Cross-team routing by
cpo-cto. Last. Requires (1)–(3) to be solid; otherwise the cross-team router is making decisions with bad inputs.
Where the human still shows up
After this all lands:
- Writing PRDs.
- Reading the dossier screenshot before merge.
- Quarterly ratification of
cpo-cto's routing decisions. - Strategic direction at the CEO-agent input level.
That's it. Everything else is software. That's the vision.
Cross-references
- Vision and mandate.
- Darwin Dashboard — what Darwin is today.
- Management agents —
interface-agent,cpo-cto,ceo. - Engineering agents — the team Darwin orchestrates.
- Compound workflow — current nightly approximation of the pipeline.
- Wiki as agent substrate (roadmap) — concrete delivery slices.