If we lock to one vendor, we lose the right to rotate the moment a better/cheaper option ships. Frontier-model leadership has flipped multiple times in the last 18 months; betting on stasis is the wrong shape of bet.
The rule
No agent prompt, skill, rule, or hook may hard-code a single provider.
Every model call goes through an adapter that takes a role (architect / test-writer / editor / verifier / reviewer) and dispatches to whichever model the routing config currently maps to. Swapping a provider is then a config edit, not a code edit.
Routing matrix (current default — refresh quarterly)
| Role | Current default | Why this slot |
|---|---|---|
| Architect | Anthropic Opus 4.x | Strongest spec/plan quality at acceptable latency for our problem size. |
| Test-writer | Anthropic Sonnet 4.x | Mechanical from a good spec; reliable structure on test files. |
| Editor | Anthropic Sonnet 4.x | Cost-driver; cheap-and-good is more important than best-and-expensive. |
| Verifier | Anthropic Sonnet 4.x | Needs reliability over depth; cheap call dominated by tool-execution time. |
| Reviewer loop | Anthropic Sonnet 4.x (or local) | Stateless per pass; could route to Ollama for first triage to save tokens. |
| Cheap routing | Ollama (phi or similar) | Triage / classify / convert. Free, local, fast. |
Refresh cadence: quarterly review. The CMO/CSO and CPO/CTO management agents own the next refresh. Inputs to the review:
- Public benchmarks: SWE-bench Verified, polyglot, agentic suites — current pass@1 numbers per model.
- Internal cost-per-merged-PR per role over the prior quarter (logged in
.compound-state/agent-service.db). - Reliability data: tool-call failure rate, timeout rate, refusal rate per model.
- New entrants: anything that wasn't in the previous matrix (new OSS models, new providers).
Adapter shape
// Conceptual — the actual implementation lives in be-agent-service.
type ModelRole =
| "architect"
| "test-writer"
| "editor"
| "verifier"
| "reviewer"
| "cheap";
interface ModelCall {
role: ModelRole;
systemPrompt: string;
messages: Array<{ role: "user" | "assistant"; content: string }>;
tools?: ToolSpec[];
budgetTokens?: number;
budgetUsd?: number;
}
interface ModelResponse {
text: string;
toolCalls?: ToolCall[];
usage: {
promptTokens: number;
completionTokens: number;
usd: number;
latencyMs: number;
};
modelUsed: string; // for logging — the actual model that handled the call
}
async function callModel(call: ModelCall): Promise<ModelResponse>;
The routing config is a single JSON/YAML map from role → providerSlug. Provider implementations sit behind the same interface (Anthropic SDK, OpenAI SDK, Google AI SDK, Ollama HTTP, vLLM, etc.).
What this rules out
- Skills / prompts that say "Use Claude Sonnet 4.6 for X." → Replace with "Use the editor role."
- Hooks that hard-code an Anthropic API endpoint. → Route through the adapter.
- Cost dashboards keyed off
claude-*model names only. → Aggregate byrole, then break down bymodelUsed. - Tool definitions that depend on Anthropic-specific tool-use features. → Use the lowest-common-denominator tool shape; opt-in to vendor extensions only inside the adapter, never leaking out.
What this allows
- A/B testing: route 10% of editor calls to a different model for a week, compare pass-rate + cost.
- Auto-fallback: if the primary provider 5xxs, the adapter retries against the secondary route.
- Cost ceilings: per-call
budgetUsd; the adapter rejects the call before sending if the running total for this PR would exceed the cap (ties into scale-or-kill.md and throughput-and-business-signals.md). - Vendor-specific optimisations behind feature flags inside the adapter (prompt caching, context caching, batch APIs) — invisible to callers.
Migration path from today
Today many agents call the claude CLI directly (be-agent-service/agents/<name>.sh). Migration is incremental:
- Introduce the adapter as a new module.
- Migrate the cheapest, highest-volume role first (probably
cheapvia Ollama). High-volume = biggest cost win on rotation. - Migrate Editor next.
- Migrate Architect last, since spec quality is most sensitive to the underlying model.
- Keep the
claudeCLI shells as a temporary fallback until every role is adapter-routed; then delete.
Cross-references
- Vision and mandate — commitment (b).
- Agent roles and model routing — the role definitions this routes against.
- Throughput and business signals — how cost data flows into model-routing decisions.
- Optimization mathematician — the agent that consumes routing telemetry to propose rotations.