LLM Integration
Provider abstraction, prompt caching, model routing
Minara Agent talks to multiple LLM providers through a single internal interface, so you can switch between Anthropic, OpenAI, OpenRouter, or local models without touching the agent loop. This page explains the provider layer, how prompt caching is structured, and how model routing works.
Why a custom router instead of vendor SDKs directly? Every feature (agent loop, backtesting, reflection, sub-agents) has different cost / latency tradeoffs. Minara's router lets us route expensive cache-stable work to Sonnet and cheap batch evaluation to Haiku from the same codebase — and fall back gracefully when a provider rate-limits. A direct vendor SDK couples you to one provider's pricing model.
See this in use: configure providers via Env Vars
(ANTHROPIC_API_KEY, OPENAI_API_KEY, OPENROUTER_API_KEY).
The provider interface
All providers implement LLMClient in
apps/agent/src/core/agent-loop.ts:
interface LLMClient {
createMessage(params: {
model: string;
system: SystemPrompt; // string OR cacheable blocks
messages: MessageParam[];
tools?: ToolDefinition[];
maxTokens?: number;
stream?: boolean;
}): Promise<LLMResponse>;
streamMessage?(params: ...): AsyncIterable<LLMStreamEvent>;
vision?(call: LLMVisionCall): Promise<LLMVisionResult>;
}Concrete implementations live under
apps/agent/src/llm/:
| File | Provider | Auth |
|---|---|---|
anthropic-api-key.ts | Anthropic | ANTHROPIC_API_KEY |
anthropic-oauth.ts | Anthropic (Claude.ai) | OAuth refresh token |
anthropic-wire.ts | Shared wire protocol | |
openai-wire.ts | OpenAI / OpenRouter | OPENAI_API_KEY / OPENROUTER_API_KEY |
openrouter.ts | OpenRouter router | OPENROUTER_API_KEY |
select-provider.ts picks
the concrete client at boot based on env vars, in priority order:
- Anthropic OAuth (if
CLAUDE_CODE_OAUTH_TOKENis set) - Anthropic API key (if
ANTHROPIC_API_KEYis set) - OpenRouter (if
OPENROUTER_API_KEYis set) - OpenAI (if
OPENAI_API_KEYis set)
Missing all four causes the process to refuse to start with a clear
error. The selection is a one-liner in app.ts; everything downstream
speaks LLMClient.
Prompt caching: not optional
Anthropic prompt cache has a 5-minute TTL and costs 10% of the normal input token price on a cache hit. For an agent that makes many similar calls per turn (catalog plus identity on every tool-call roundtrip), cache hits are the difference between "acceptable" and "expensive."
The system prompt is therefore split into explicit blocks:
system: [
{ type: "text", text: identityPrompt, cache: true },
{ type: "text", text: skillCatalog, cache: true },
{ type: "text", text: activeSkillPrompts, cache: false },
{ type: "text", text: signalContextBlock, cache: false },
{ type: "text", text: pendingConfirmation, cache: false },
]Invariants the prompt builder enforces:
- Cacheable blocks come first. Anything after a non-cacheable block cannot be cached itself, because the cache key is a strict prefix match.
- Cacheable blocks are stable. Identity and catalog only change when the skill registry or base prompt changes, never per turn.
- Dynamic content is at the tail. Active skill fragments, signal context, pending confirmations, and the conversation history all go after the cache boundary.
anthropic-wire.ts maps our
SystemPromptBlock[] into Anthropic's cache_control: {type: "ephemeral"} markers and tracks cache hit rates in the response
metadata, which surface in structured logs as
llm.cache_read_input_tokens and llm.cache_creation_input_tokens.
For providers that don't support prompt caching (OpenAI as of this writing), the wire layer silently concatenates blocks into a single system string. No caller-facing difference, no behavioral divergence.
Model selection
The model used per turn is decided by
apps/agent/src/learning/llm-router.ts
(confusingly unrelated to the skill router; this one routes LLM
calls to models). The default policy:
- Main agent turn. Claude Sonnet 4.6 with 1M context.
- Deep research sub-agent. Claude Opus 4.6 (high reasoning).
- Vision calls. Whatever provider supports vision.
- Embeddings. A dedicated
EMBEDDING_PROVIDERenv var.
Every call records {provider, model, reason} in the audit log, so
you can answer "why did this turn cost X" with a SQL query rather
than stack-walking through code.
Overrides:
AGENT_MODELpins the main agent-loop model. Also settable interactively viaminara model use <id>or the/modelslash command, both of which persist to$MINARA_DATA_DIR/env.
Tool-use shape
Both providers (Anthropic and OpenAI) support tool use, but the wire
format differs. The agent loop speaks Anthropic's shape internally
and openai-wire.ts translates
in both directions. Consequences:
- Tool schemas are defined once as JSON Schema in
ToolEntry.schema.parameters. Both wire layers consume the same schema. - Stop reasons are normalized. Anthropic's
"end_turn"/"tool_use"/"max_tokens"and OpenAI's"stop"/"tool_calls"/"length"get mapped to a common enum the loop consumes. - Streaming events are normalized into a common
LLMStreamEventunion so/chat/streamdoesn't care about the upstream provider.
Vision
Vision calls go through a separate LLMClient.vision() method
because not all providers support vision inline with tool calls.
The vision_analyze tool calls it explicitly, and tools that need
to read a screenshot (the browser.* set) encode the image as
base64 before calling.
OAuth flow for Anthropic
Anthropic OAuth lets users sign in with their Claude.ai account
rather than providing an API key. The flow lives in
apps/agent/src/llm/oauth/:
/auth/anthropic/initstarts the PKCE flow and returns an auth URL.- User visits, approves, is redirected back.
/auth/anthropic/exchangeexchanges the code for a refresh token.- The refresh token is encrypted with a local key and stored in SQLite.
- Every LLM call transparently refreshes the access token if needed.
The same pattern applies to OpenAI and OpenRouter. See the Auth endpoints reference for the exact routes.
Adding a provider
To support a new LLM provider:
- Implement
LLMClientinapps/agent/src/llm/<name>-wire.ts. Prompt caching support is optional but strongly preferred. - Add a selection branch in
select-provider.ts. - Add the env var to
.env.exampleand the env-vars inventory. - Write integration tests under
tests/integration/llm/using the wire-level fakes intests/fakes/llm/.
Do not reach into the agent loop to add a provider-specific code path. If a provider has a quirk, hide it in the wire layer.