MINARA

Testing

How tests are structured and what to write

Minara Agent uses vitest for all testing. Tests live under tests/ with a three-tier split and shared fakes for the interesting dependencies. This page explains the conventions so new tests fit in.

Tiers

tests/
├── unit/          — pure logic, no IO, < 50ms each
├── integration/   — real SQLite, real tool dispatch, fake LLM/network
├── e2e/           — spin up the real gateway, hit it with HTTP, real DB
├── fakes/         — shared test doubles (LLM, Minara client, filesystem)
└── setup.ts       — vitest global setup (env vars, temp dirs)

Run them:

npm test                    # everything
npm run test:unit           # unit only — fast feedback loop
npm run test:e2e            # slowest, last
npm run test -- --watch     # watch mode on a subset

Unit tests are the default. If you're adding a new pure function, a new reducer, or a new router scoring rule, it belongs in tests/unit/. Integration tests exist for anything that touches the SQLite schema, the tool registry dispatch, or the agent loop end to end. E2E tests exercise the HTTP gateway as a black box.

Fakes over mocks

tests/fakes/ holds hand-written doubles rather than jest-style mocks. The two that matter most:

FakeLLMClient

Implements LLMClient with a scripted response queue:

const llm = new FakeLLMClient();
llm.enqueue({
  text: "I'll check the price.",
  tool_calls: [{ name: "price", args: { symbol: "BTC" } }],
  stop_reason: "tool_use",
});
llm.enqueue({
  text: "BTC is $65,000.",
  stop_reason: "end_turn",
});

const result = await agentLoop.run({ ... , llm });
expect(llm.callCount).toBe(2);
expect(result.finalText).toContain("65,000");

No real HTTP traffic, no token counting, no cache semantics. The tests that cover cache behavior explicitly use RecordingLLMClient, which asserts on the cache_control markers in the system prompt blocks without contacting an upstream.

FakeMinaraClient

Canned responses for price / balance / swap / perps endpoints. Mutable state so integration tests can exercise round-trip flows:

const minara = new FakeMinaraClient();
minara.setPrice("BTC", 65000);
minara.setBalance("BTC", 0.5);
// run a tool that calls minara.swap(...)
minara.assertSwapHappened({ from: "USDC", to: "BTC", amount_usd: 1000 });

tempDataDir

tests/setup.ts gives every test file a fresh $dataDir under os.tmpdir() and cleans it up on exit. Integration tests can open a real SQLite file without polluting the repo or each other.

What to test

Accumulated conventions from the existing suite:

  1. Every BeforeToolCallHook gets its own unit test. Hooks are small, pure, and easy to test in isolation. The whole safety model rides on them. Write at least:
    • happy path (allowed call passes through)
    • block path (expected reason string)
    • context propagation (hook reads the right ToolCallContext fields)
  2. Every new tool factory gets a smoke test. At minimum: given required env vars, the factory returns a non-empty tool list; given missing env vars, it returns [] without throwing.
  3. Every new skill gets a registration test. Verify the skill registers, verify its requires_env actually gates it, verify the router scores it for an obvious query. Keep it one test file per skill.
  4. Router changes get table-driven tests. The scoring rules are small and deterministic. A single describe.each table is usually the right shape.
  5. Agent loop changes get integration tests. The loop has a lot of state. Test it with a real registry and a scripted fake LLM rather than a unit test over internal helpers.

What NOT to test

  • Don't test Anthropic / OpenAI round trips in CI. The wire layer has dedicated tests that hit a recorded fixture; the rest of the suite uses FakeLLMClient.
  • Don't test tool schemas for "the right shape." The schema is the source of truth; a test that duplicates it is a maintenance tax that fires on every legitimate change.
  • Don't use vi.mock() on internal modules. If a module is hard to test without mocking, it's probably badly factored. Fix the factoring instead.
  • Don't hit real external APIs (Minara backend, Glassnode, OpenRouter, …). Every one of those has a corresponding fake. New providers must ship a fake alongside the production adapter.

Coverage goals

Coverage numbers aren't a target, but the shape matters:

  • Hooks and router scoring. 100%, period. These are where safety lives.
  • Tool handlers. Enough to cover happy path plus one error path. The handlers are usually thin wrappers over typed clients, so exhaustive coverage has diminishing returns.
  • Skill registration. One test per skill proving it loads and routes correctly.
  • Agent loop. Integration tests for every branch in the turn machine (end_turn, tool_use, max_iterations, hook block, pending confirmation).

Debugging a failing test

  1. Run the single file: npx vitest run tests/unit/foo.test.ts
  2. Add test.only to isolate the one case.
  3. DEBUG=1 npx vitest run ... enables verbose logger output.
  4. For integration test failures, the test's tempDataDir is printed in the failure message. You can open the SQLite file post-mortem with sqlite3 /tmp/minara-test-xxx/minara.db.

Adding a new test tier

Don't. Three tiers is already more than most projects need. If you feel you need a fourth (contract tests? property tests?), that's a prompt to revisit whether the unit/integration line is drawn in the right place rather than a prompt to add directories.

On this page