Manual ramp-up and roll-back are repeating tasks. By the vision rule, "if humans repeat a task, the system is wrong." This page describes the mechanism that takes the human out of the scale / kill loop.
The two automations
Scale (auto-promote)
When a feature's post-merge metrics beat its PRD baseline for N consecutive days, the feature gate ramps automatically:
1% → 10% → 25% → 50% → 100% over 5 days, gated by daily metric check
If at any step the metric regresses, the ramp pauses and reverses (see "kill" below).
Kill (auto-rollback)
When a feature's post-merge metrics regress past a threshold for N consecutive days, the feature flag is disabled automatically:
flag.enabled = falsevia the gate provider's API.- Open a follow-up plan in
wiki/plans/describing the regression, withimplementation_status: plannedand the offending feature flag name in the title. - Notify the human channel via Telegram.
The code stays in main — only the flag flips. Re-enabling is a separate decision that re-enters the PRD pipeline.
Required for this to work
| Piece | Status |
|---|---|
| Feature flag service | Required — GrowthBook or equivalent. Not yet wired up in this repo. |
| Per-feature metric definitions | Required — each PRD must declare a primary metric and a regression threshold. |
| Notification path | Reuse interface-agent for Telegram on every ramp / roll-back decision. |
The PRD template needs two new fields that don't exist today:
primary_metric: "promotion_rate_pct" # what to watch
regression_threshold: -2.0 # rollback if metric drops by ≥2 points
ramp_window_days: 5 # full-rollout cadence
These need to land in SCHEMA.md as part of implementing this page; see the roadmap entry in wiki-as-agent-substrate-2026-04-29.md.
What the agent reads
inputs:
- flag-state from gate provider (per-feature, current ramp percentage)
- metrics warehouse: 7-day window for each feature's primary_metric
- PRD frontmatter: primary_metric, regression_threshold, ramp_window_days
- cash budget: from finance signal (see business-signals page)
outputs (per feature, per day):
- decision: ramp_up | hold | hold_with_warning | roll_back
- reason: machine-readable + human paragraph
- notification: posted to Telegram if decision != hold
Why this isn't a feature flag service
A feature flag service exposes a flag and lets a human ramp it. This page describes the agent that decides what to do with the service. The two are complementary. We need:
- Provider (e.g. GrowthBook) — for the flag plumbing.
- Decision agent (this page) — for the autonomy.
Without the agent, every "should we ramp?" decision falls back on a human; we've automated only the mechanics, not the work.
What "kill" doesn't mean
Kill flips the flag, not the code. The diff stays in main. The reasoning:
- Reverting code on roll-back creates a deployment storm (every dependent piece needs to also revert).
- Re-enabling later is one flag flip, not a re-merge.
- Audit trail is cleaner: PR landed, flag enabled, metric regressed, flag disabled. Each event is its own commit / log row.
If a feature consistently fails and the team decides to remove it, that's a separate cleanup PR — the agent flags this scenario but doesn't auto-merge a code revert.
Failure modes
- Metric pipeline outage → agent assumes "no signal" and holds. Never auto-rolls-back on missing data; missing data isn't regression.
- Cash budget exceeded → ramp is paused regardless of metrics. Existing rolled-out features don't roll back, but new ramps freeze.
- Two features compete for the same metric → the agent flags the conflict and asks the human. Don't auto-arbitrate.
Cross-references
- Vision and mandate — commitment (c).
- Throughput and business signals — the cash-budget input.
- PRD-to-PR pipeline — where
primary_metric/regression_thresholdget added to the PRD frontmatter. - Wiki as agent substrate (roadmap) — concrete steps to implement this.
scheduleskill (Claude first-party) — the cron primitive the daily agent runs on.