Testing

Minara Agent uses vitest for all testing. Tests live under tests/ with a three-tier split and shared fakes for the interesting dependencies. This page explains the conventions so new tests fit in.

Tiers

tests/
├── unit/          — pure logic, no IO, < 50ms each
├── integration/   — real SQLite, real tool dispatch, fake LLM/network
├── e2e/           — spin up the real gateway, hit it with HTTP, real DB
├── fakes/         — shared test doubles (LLM, Minara client, filesystem)
└── setup.ts       — vitest global setup (env vars, temp dirs)

Run them:

npm test                    # everything
npm run test:unit           # unit only — fast feedback loop
npm run test:e2e            # slowest, last
npm run test -- --watch     # watch mode on a subset

Unit tests are the default. If you're adding a new pure function, a new reducer, or a new router scoring rule, it belongs in tests/unit/. Integration tests exist for anything that touches the SQLite schema, the tool registry dispatch, or the agent loop end to end. E2E tests exercise the HTTP gateway as a black box.

Fakes over mocks

tests/fakes/ holds hand-written doubles rather than jest-style mocks. The two that matter most:

`FakeLLMClient`

Implements LLMClient with a scripted response queue:

const llm = new FakeLLMClient();
llm.enqueue({
  text: "I'll check the price.",
  tool_calls: [{ name: "price", args: { symbol: "BTC" } }],
  stop_reason: "tool_use",
});
llm.enqueue({
  text: "BTC is $65,000.",
  stop_reason: "end_turn",
});

const result = await agentLoop.run({ ... , llm });
expect(llm.callCount).toBe(2);
expect(result.finalText).toContain("65,000");

No real HTTP traffic, no token counting, no cache semantics. The tests that cover cache behavior explicitly use RecordingLLMClient, which asserts on the cache_control markers in the system prompt blocks without contacting an upstream.

`FakeMinaraClient`

Canned responses for price / balance / swap / perps endpoints. Mutable state so integration tests can exercise round-trip flows:

const minara = new FakeMinaraClient();
minara.setPrice("BTC", 65000);
minara.setBalance("BTC", 0.5);
// run a tool that calls minara.swap(...)
minara.assertSwapHappened({ from: "USDC", to: "BTC", amount_usd: 1000 });

`tempDataDir`

tests/setup.ts gives every test file a fresh $dataDir under os.tmpdir() and cleans it up on exit. Integration tests can open a real SQLite file without polluting the repo or each other.

What to test

Accumulated conventions from the existing suite:

Every BeforeToolCallHook gets its own unit test. Hooks are small, pure, and easy to test in isolation. The whole safety model rides on them. Write at least:
- happy path (allowed call passes through)
- block path (expected reason string)
- context propagation (hook reads the right ToolCallContext fields)
Every new tool factory gets a smoke test. At minimum: given required env vars, the factory returns a non-empty tool list; given missing env vars, it returns [] without throwing.
Every new skill gets a registration test. Verify the skill registers, verify its requires_env actually gates it, verify the router scores it for an obvious query. Keep it one test file per skill.
Router changes get table-driven tests. The scoring rules are small and deterministic. A single describe.each table is usually the right shape.
Agent loop changes get integration tests. The loop has a lot of state. Test it with a real registry and a scripted fake LLM rather than a unit test over internal helpers.

What NOT to test

Don't test Anthropic / OpenAI round trips in CI. The wire layer has dedicated tests that hit a recorded fixture; the rest of the suite uses FakeLLMClient.
Don't test tool schemas for "the right shape." The schema is the source of truth; a test that duplicates it is a maintenance tax that fires on every legitimate change.
Don't use vi.mock() on internal modules. If a module is hard to test without mocking, it's probably badly factored. Fix the factoring instead.
Don't hit real external APIs (Minara backend, Glassnode, OpenRouter, …). Every one of those has a corresponding fake. New providers must ship a fake alongside the production adapter.

Coverage goals

Coverage numbers aren't a target, but the shape matters:

Hooks and router scoring. 100%, period. These are where safety lives.
Tool handlers. Enough to cover happy path plus one error path. The handlers are usually thin wrappers over typed clients, so exhaustive coverage has diminishing returns.
Skill registration. One test per skill proving it loads and routes correctly.
Agent loop. Integration tests for every branch in the turn machine (end_turn, tool_use, max_iterations, hook block, pending confirmation).

Debugging a failing test

Run the single file: npx vitest run tests/unit/foo.test.ts
Add test.only to isolate the one case.
DEBUG=1 npx vitest run ... enables verbose logger output.
For integration test failures, the test's tempDataDir is printed in the failure message. You can open the SQLite file post-mortem with sqlite3 /tmp/minara-test-xxx/minara.db.

Adding a new test tier

Don't. Three tiers is already more than most projects need. If you feel you need a fourth (contract tests? property tests?), that's a prompt to revisit whether the unit/integration line is drawn in the right place rather than a prompt to add directories.

On this page