Agentic Workflow

Model and vendor agnosticism

Models change weekly; the architecture doesn't. Why CAIRE routes through adapters instead of vendor SDKs.

If we lock to one vendor, we lose the right to rotate the moment a better/cheaper option ships. Frontier-model leadership has flipped multiple times in the last 18 months; betting on stasis is the wrong shape of bet.

The rule

No agent prompt, skill, rule, or hook may hard-code a single provider.

Every model call goes through an adapter that takes a role (architect / test-writer / editor / verifier / reviewer) and dispatches to whichever model the routing config currently maps to. Swapping a provider is then a config edit, not a code edit.

Routing matrix (current default — refresh quarterly)

Role Current default Why this slot
Architect Anthropic Opus 4.x Strongest spec/plan quality at acceptable latency for our problem size.
Test-writer Anthropic Sonnet 4.x Mechanical from a good spec; reliable structure on test files.
Editor Anthropic Sonnet 4.x Cost-driver; cheap-and-good is more important than best-and-expensive.
Verifier Anthropic Sonnet 4.x Needs reliability over depth; cheap call dominated by tool-execution time.
Reviewer loop Anthropic Sonnet 4.x (or local) Stateless per pass; could route to Ollama for first triage to save tokens.
Cheap routing Ollama (phi or similar) Triage / classify / convert. Free, local, fast.

Refresh cadence: quarterly review. The CMO/CSO and CPO/CTO management agents own the next refresh. Inputs to the review:

  1. Public benchmarks: SWE-bench Verified, polyglot, agentic suites — current pass@1 numbers per model.
  2. Internal cost-per-merged-PR per role over the prior quarter (logged in .compound-state/agent-service.db).
  3. Reliability data: tool-call failure rate, timeout rate, refusal rate per model.
  4. New entrants: anything that wasn't in the previous matrix (new OSS models, new providers).

Adapter shape

// Conceptual — the actual implementation lives in be-agent-service.

type ModelRole =
  | "architect"
  | "test-writer"
  | "editor"
  | "verifier"
  | "reviewer"
  | "cheap";

interface ModelCall {
  role: ModelRole;
  systemPrompt: string;
  messages: Array<{ role: "user" | "assistant"; content: string }>;
  tools?: ToolSpec[];
  budgetTokens?: number;
  budgetUsd?: number;
}

interface ModelResponse {
  text: string;
  toolCalls?: ToolCall[];
  usage: {
    promptTokens: number;
    completionTokens: number;
    usd: number;
    latencyMs: number;
  };
  modelUsed: string; // for logging — the actual model that handled the call
}

async function callModel(call: ModelCall): Promise<ModelResponse>;

The routing config is a single JSON/YAML map from role → providerSlug. Provider implementations sit behind the same interface (Anthropic SDK, OpenAI SDK, Google AI SDK, Ollama HTTP, vLLM, etc.).

What this rules out

What this allows

Migration path from today

Today many agents call the claude CLI directly (be-agent-service/agents/<name>.sh). Migration is incremental:

  1. Introduce the adapter as a new module.
  2. Migrate the cheapest, highest-volume role first (probably cheap via Ollama). High-volume = biggest cost win on rotation.
  3. Migrate Editor next.
  4. Migrate Architect last, since spec quality is most sensitive to the underlying model.
  5. Keep the claude CLI shells as a temporary fallback until every role is adapter-routed; then delete.

Cross-references