MINARA

LLM Integration

Provider abstraction, prompt caching, model routing

Minara Agent talks to multiple LLM providers through a single internal interface, so you can switch between Anthropic, OpenAI, OpenRouter, or local models without touching the agent loop. This page explains the provider layer, how prompt caching is structured, and how model routing works.

Why a custom router instead of vendor SDKs directly? Every feature (agent loop, backtesting, reflection, sub-agents) has different cost / latency tradeoffs. Minara's router lets us route expensive cache-stable work to Sonnet and cheap batch evaluation to Haiku from the same codebase — and fall back gracefully when a provider rate-limits. A direct vendor SDK couples you to one provider's pricing model.

See this in use: configure providers via Env Vars (ANTHROPIC_API_KEY, OPENAI_API_KEY, OPENROUTER_API_KEY).

The provider interface

All providers implement LLMClient in apps/agent/src/core/agent-loop.ts:

interface LLMClient {
  createMessage(params: {
    model: string;
    system: SystemPrompt;           // string OR cacheable blocks
    messages: MessageParam[];
    tools?: ToolDefinition[];
    maxTokens?: number;
    stream?: boolean;
  }): Promise<LLMResponse>;

  streamMessage?(params: ...): AsyncIterable<LLMStreamEvent>;
  vision?(call: LLMVisionCall): Promise<LLMVisionResult>;
}

Concrete implementations live under apps/agent/src/llm/:

FileProviderAuth
anthropic-api-key.tsAnthropicANTHROPIC_API_KEY
anthropic-oauth.tsAnthropic (Claude.ai)OAuth refresh token
anthropic-wire.tsShared wire protocol
openai-wire.tsOpenAI / OpenRouterOPENAI_API_KEY / OPENROUTER_API_KEY
openrouter.tsOpenRouter routerOPENROUTER_API_KEY

select-provider.ts picks the concrete client at boot based on env vars, in priority order:

  1. Anthropic OAuth (if CLAUDE_CODE_OAUTH_TOKEN is set)
  2. Anthropic API key (if ANTHROPIC_API_KEY is set)
  3. OpenRouter (if OPENROUTER_API_KEY is set)
  4. OpenAI (if OPENAI_API_KEY is set)

Missing all four causes the process to refuse to start with a clear error. The selection is a one-liner in app.ts; everything downstream speaks LLMClient.

Prompt caching: not optional

Anthropic prompt cache has a 5-minute TTL and costs 10% of the normal input token price on a cache hit. For an agent that makes many similar calls per turn (catalog plus identity on every tool-call roundtrip), cache hits are the difference between "acceptable" and "expensive."

The system prompt is therefore split into explicit blocks:

system: [
  { type: "text", text: identityPrompt,      cache: true  },
  { type: "text", text: skillCatalog,        cache: true  },
  { type: "text", text: activeSkillPrompts,  cache: false },
  { type: "text", text: signalContextBlock,  cache: false },
  { type: "text", text: pendingConfirmation, cache: false },
]

Invariants the prompt builder enforces:

  1. Cacheable blocks come first. Anything after a non-cacheable block cannot be cached itself, because the cache key is a strict prefix match.
  2. Cacheable blocks are stable. Identity and catalog only change when the skill registry or base prompt changes, never per turn.
  3. Dynamic content is at the tail. Active skill fragments, signal context, pending confirmations, and the conversation history all go after the cache boundary.

anthropic-wire.ts maps our SystemPromptBlock[] into Anthropic's cache_control: {type: "ephemeral"} markers and tracks cache hit rates in the response metadata, which surface in structured logs as llm.cache_read_input_tokens and llm.cache_creation_input_tokens.

For providers that don't support prompt caching (OpenAI as of this writing), the wire layer silently concatenates blocks into a single system string. No caller-facing difference, no behavioral divergence.

Model selection

The model used per turn is decided by apps/agent/src/learning/llm-router.ts (confusingly unrelated to the skill router; this one routes LLM calls to models). The default policy:

  • Main agent turn. Claude Sonnet 4.6 with 1M context.
  • Deep research sub-agent. Claude Opus 4.6 (high reasoning).
  • Vision calls. Whatever provider supports vision.
  • Embeddings. A dedicated EMBEDDING_PROVIDER env var.

Every call records {provider, model, reason} in the audit log, so you can answer "why did this turn cost X" with a SQL query rather than stack-walking through code.

Overrides:

  • AGENT_MODEL pins the main agent-loop model. Also settable interactively via minara model use <id> or the /model slash command, both of which persist to $MINARA_DATA_DIR/env.

Tool-use shape

Both providers (Anthropic and OpenAI) support tool use, but the wire format differs. The agent loop speaks Anthropic's shape internally and openai-wire.ts translates in both directions. Consequences:

  • Tool schemas are defined once as JSON Schema in ToolEntry.schema.parameters. Both wire layers consume the same schema.
  • Stop reasons are normalized. Anthropic's "end_turn" / "tool_use" / "max_tokens" and OpenAI's "stop" / "tool_calls" / "length" get mapped to a common enum the loop consumes.
  • Streaming events are normalized into a common LLMStreamEvent union so /chat/stream doesn't care about the upstream provider.

Vision

Vision calls go through a separate LLMClient.vision() method because not all providers support vision inline with tool calls. The vision_analyze tool calls it explicitly, and tools that need to read a screenshot (the browser.* set) encode the image as base64 before calling.

OAuth flow for Anthropic

Anthropic OAuth lets users sign in with their Claude.ai account rather than providing an API key. The flow lives in apps/agent/src/llm/oauth/:

  1. /auth/anthropic/init starts the PKCE flow and returns an auth URL.
  2. User visits, approves, is redirected back.
  3. /auth/anthropic/exchange exchanges the code for a refresh token.
  4. The refresh token is encrypted with a local key and stored in SQLite.
  5. Every LLM call transparently refreshes the access token if needed.

The same pattern applies to OpenAI and OpenRouter. See the Auth endpoints reference for the exact routes.

Adding a provider

To support a new LLM provider:

  1. Implement LLMClient in apps/agent/src/llm/<name>-wire.ts. Prompt caching support is optional but strongly preferred.
  2. Add a selection branch in select-provider.ts.
  3. Add the env var to .env.example and the env-vars inventory.
  4. Write integration tests under tests/integration/llm/ using the wire-level fakes in tests/fakes/llm/.

Do not reach into the agent loop to add a provider-specific code path. If a provider has a quirk, hide it in the wire layer.

On this page