Model and vendor agnosticism | Agentic Workflow

If we lock to one vendor, we lose the right to rotate the moment a better/cheaper option ships. Frontier-model leadership has flipped multiple times in the last 18 months; betting on stasis is the wrong shape of bet.

The rule

No agent prompt, skill, rule, or hook may hard-code a single provider.

Every model call goes through an adapter that takes a role (architect / test-writer / editor / verifier / reviewer) and dispatches to whichever model the routing config currently maps to. Swapping a provider is then a config edit, not a code edit.

Routing matrix (current default — refresh quarterly)

Role	Current default	Why this slot
Architect	Anthropic Opus 4.x	Strongest spec/plan quality at acceptable latency for our problem size.
Test-writer	Anthropic Sonnet 4.x	Mechanical from a good spec; reliable structure on test files.
Editor	Anthropic Sonnet 4.x	Cost-driver; cheap-and-good is more important than best-and-expensive.
Verifier	Anthropic Sonnet 4.x	Needs reliability over depth; cheap call dominated by tool-execution time.
Reviewer loop	Anthropic Sonnet 4.x (or local)	Stateless per pass; could route to Ollama for first triage to save tokens.
Cheap routing	Ollama (phi or similar)	Triage / classify / convert. Free, local, fast.

Refresh cadence: quarterly review. The CMO/CSO and CPO/CTO management agents own the next refresh. Inputs to the review:

Public benchmarks: SWE-bench Verified, polyglot, agentic suites — current pass@1 numbers per model.
Internal cost-per-merged-PR per role over the prior quarter (logged in .compound-state/agent-service.db).
Reliability data: tool-call failure rate, timeout rate, refusal rate per model.
New entrants: anything that wasn't in the previous matrix (new OSS models, new providers).

Adapter shape

// Conceptual — the actual implementation lives in be-agent-service.

type ModelRole =
  | "architect"
  | "test-writer"
  | "editor"
  | "verifier"
  | "reviewer"
  | "cheap";

interface ModelCall {
  role: ModelRole;
  systemPrompt: string;
  messages: Array<{ role: "user" | "assistant"; content: string }>;
  tools?: ToolSpec[];
  budgetTokens?: number;
  budgetUsd?: number;
}

interface ModelResponse {
  text: string;
  toolCalls?: ToolCall[];
  usage: {
    promptTokens: number;
    completionTokens: number;
    usd: number;
    latencyMs: number;
  };
  modelUsed: string; // for logging — the actual model that handled the call
}

async function callModel(call: ModelCall): Promise<ModelResponse>;

The routing config is a single JSON/YAML map from role → providerSlug. Provider implementations sit behind the same interface (Anthropic SDK, OpenAI SDK, Google AI SDK, Ollama HTTP, vLLM, etc.).

What this rules out

Skills / prompts that say "Use Claude Sonnet 4.6 for X." → Replace with "Use the editor role."
Hooks that hard-code an Anthropic API endpoint. → Route through the adapter.
Cost dashboards keyed off claude-* model names only. → Aggregate by role, then break down by modelUsed.
Tool definitions that depend on Anthropic-specific tool-use features. → Use the lowest-common-denominator tool shape; opt-in to vendor extensions only inside the adapter, never leaking out.

What this allows

A/B testing: route 10% of editor calls to a different model for a week, compare pass-rate + cost.
Auto-fallback: if the primary provider 5xxs, the adapter retries against the secondary route.
Cost ceilings: per-call budgetUsd; the adapter rejects the call before sending if the running total for this PR would exceed the cap (ties into scale-or-kill.md and throughput-and-business-signals.md).
Vendor-specific optimisations behind feature flags inside the adapter (prompt caching, context caching, batch APIs) — invisible to callers.

Migration path from today

Today many agents call the claude CLI directly (be-agent-service/agents/<name>.sh). Migration is incremental:

Introduce the adapter as a new module.
Migrate the cheapest, highest-volume role first (probably cheap via Ollama). High-volume = biggest cost win on rotation.
Migrate Editor next.
Migrate Architect last, since spec quality is most sensitive to the underlying model.
Keep the claude CLI shells as a temporary fallback until every role is adapter-routed; then delete.

Cross-references

Vision and mandate — commitment (b).
Agent roles and model routing — the role definitions this routes against.
Throughput and business signals — how cost data flows into model-routing decisions.
Optimization mathematician — the agent that consumes routing telemetry to propose rotations.