Scale or kill | Agentic Workflow

Manual ramp-up and roll-back are repeating tasks. By the vision rule, "if humans repeat a task, the system is wrong." This page describes the mechanism that takes the human out of the scale / kill loop.

The two automations

Scale (auto-promote)

When a feature's post-merge metrics beat its PRD baseline for N consecutive days, the feature gate ramps automatically:

1% → 10% → 25% → 50% → 100% over 5 days, gated by daily metric check

If at any step the metric regresses, the ramp pauses and reverses (see "kill" below).

Kill (auto-rollback)

When a feature's post-merge metrics regress past a threshold for N consecutive days, the feature flag is disabled automatically:

flag.enabled = false via the gate provider's API.
Open a follow-up plan in wiki/plans/ describing the regression, with implementation_status: planned and the offending feature flag name in the title.
Notify the human channel via Telegram.

The code stays in main — only the flag flips. Re-enabling is a separate decision that re-enters the PRD pipeline.

Required for this to work

Piece	Status
Feature flag service	Required — GrowthBook or equivalent. Not yet wired up in this repo.
Per-feature metric definitions	Required — each PRD must declare a primary metric and a regression threshold.
Notification path	Reuse `interface-agent` for Telegram on every ramp / roll-back decision.

The PRD template needs two new fields that don't exist today:

primary_metric: "promotion_rate_pct" # what to watch
regression_threshold: -2.0 # rollback if metric drops by ≥2 points
ramp_window_days: 5 # full-rollout cadence

These need to land in SCHEMA.md as part of implementing this page; see the roadmap entry in wiki-as-agent-substrate-2026-04-29.md.

What the agent reads

inputs:
  - flag-state from gate provider (per-feature, current ramp percentage)
  - metrics warehouse: 7-day window for each feature's primary_metric
  - PRD frontmatter: primary_metric, regression_threshold, ramp_window_days
  - cash budget: from finance signal (see business-signals page)

outputs (per feature, per day):
  - decision: ramp_up | hold | hold_with_warning | roll_back
  - reason: machine-readable + human paragraph
  - notification: posted to Telegram if decision != hold

Why this isn't a feature flag service

A feature flag service exposes a flag and lets a human ramp it. This page describes the agent that decides what to do with the service. The two are complementary. We need:

Provider (e.g. GrowthBook) — for the flag plumbing.
Decision agent (this page) — for the autonomy.

Without the agent, every "should we ramp?" decision falls back on a human; we've automated only the mechanics, not the work.

What "kill" doesn't mean

Kill flips the flag, not the code. The diff stays in main. The reasoning:

Reverting code on roll-back creates a deployment storm (every dependent piece needs to also revert).
Re-enabling later is one flag flip, not a re-merge.
Audit trail is cleaner: PR landed, flag enabled, metric regressed, flag disabled. Each event is its own commit / log row.

If a feature consistently fails and the team decides to remove it, that's a separate cleanup PR — the agent flags this scenario but doesn't auto-merge a code revert.

Failure modes

Metric pipeline outage → agent assumes "no signal" and holds. Never auto-rolls-back on missing data; missing data isn't regression.
Cash budget exceeded → ramp is paused regardless of metrics. Existing rolled-out features don't roll back, but new ramps freeze.
Two features compete for the same metric → the agent flags the conflict and asks the human. Don't auto-arbitrate.

Cross-references

Vision and mandate — commitment (c).
Throughput and business signals — the cash-budget input.
PRD-to-PR pipeline — where primary_metric / regression_threshold get added to the PRD frontmatter.
Wiki as agent substrate (roadmap) — concrete steps to implement this.
schedule skill (Claude first-party) — the cron primitive the daily agent runs on.