Spec as contract | Agentic Workflow

Plain language is too imprecise to coordinate multiple agents. PRDs are too long to keep all of in working memory at every stage. The spec is the in-between artefact: precise enough to compile to tests, short enough to fit in every agent's context.

What "spec" means here

A wiki/specs/<feature>.md file with two sections:

Acceptance criteria in Gherkin-style scenarios.
Coordination metadata — files expected to be touched, model role expected at each stage, any explicit non-goals.

Example skeleton:

# Spec: Lab promotion banner

## Acceptance criteria

Feature: Best-trial banner on lab leaderboard
Scenario: Pinned trial is shown above the leaderboard
Given a service area with at least one completed lab trial
And one trial is pinned in ServiceAreaBestLabScenario
When the user opens /admin/optimization-lab/<id>
Then the BestTrialBanner is visible
And the banner shows the pinned trial's metrics

Scenario: Promote button is disabled for in-progress trials
Given a trial whose solution.status is solving_active
Then the Promote button is disabled
And hovering the button shows the tooltip "Trial still solving"

## Files expected to be touched

- apps/dashboard/src/pages/admin/optimization-lab/BestTrialBanner.tsx (new)
- apps/dashboard/src/pages/admin/optimization-lab/LeaderboardSection.tsx (modify)
- apps/dashboard/src/pages/admin/optimization-lab/**tests**/BestTrialBanner.test.tsx (new)
- e2e/lab-promotion-banner.spec.ts (new)

## Non-goals

- No backend changes (the resolver already returns the SA-best trial id).
- No copy changes outside the banner itself.
- No analytics events for banner views.

## Stage routing (overrides)

- editor: default
- verifier: default + perf-reviewer NOT required (no list resolver / loop touched)

Why Gherkin

Three reasons:

Tests are mechanical from it. The Test-writer agent compiles each scenario to one or more failing test cases. The mapping is direct enough that a regression in the test-writer is easy to spot.
Reviewers can grep it. "Find every scenario that mentions BestTrialBanner" is one grep away. Plain prose specs require reading.
Drift is auditable. When the PRD says one thing and the spec says another, the diff between PRD and spec is the single discussion to have. Don't bury the disagreement under prose.

The single-source-of-truth rule

Once the spec is written, the PRD is no longer the source of truth for what to build. The spec is.

Reasoning: agents downstream see the spec, not the PRD. If the PRD changes mid-implementation and the spec doesn't, the editor and verifier will keep building toward the spec — and that's correct, because the spec is what they verified against.

If the PRD changes, regenerate the spec. Treat it as a new feature slice, on a fresh or explicitly resumed worktree, with a fresh test pass. Don't silently amend mid-flight.

Failure mode: spec drift

The most common failure: the architect produces a spec that mentions a behaviour the PRD didn't specify, and the editor cheerfully implements it. Mitigations:

The architect must list ## Non-goals explicitly. If "X is out of scope" is written down, the editor and reviewer agents have a hook to push back when X starts creeping in.
The verifier diffs the spec against the PRD frontmatter sources: and surfaces any acceptance criterion that doesn't trace back. P2 finding.
Humans look at one specific commit on review: the "spec" commit. That commit is the new contract; everything after it is implementation noise.

Intake from Cursor plans

Cursor plans under .cursor/plans/ are draft planning artifacts, not executable contracts. Before implementation begins, the orchestrator must:

Convert the accepted plan into wiki/plans/<feature>-YYYY-MM-DD.md.
Preserve the Cursor plan path in the PRD sources: frontmatter.
Generate wiki/specs/<feature>.md from the PRD.
Use the spec, not the Cursor plan, as the source of truth for tests and implementation.

Why not skip the spec and just write tests?

Tested, doesn't work. Pure test-first runs into:

Tests that pass for the wrong reason (over-mocked, tautological).
Tests that don't capture acceptance because the test author didn't fully understand the PRD.
Acceptance scenarios buried inside test files where reviewers don't read them.

Putting the acceptance criteria in a Gherkin block gives the test-writer a checklist to compile from, and gives reviewers something to grep against without diving into test internals.

Cross-references

PRD-to-PR pipeline — where the spec sits in the pipeline (stage 2).
Agent roles and model routing — Architect agent owns spec authorship; Test-writer compiles from it.
Verification and evidence — the dossier proves the spec's scenarios actually pass in the deployed UI.
Vision and mandate — commitment (a): humans define what; agents execute how. The spec is the boundary.