Observability
Logging, auditing, and debugging a running agent
When an agent does the wrong thing, you need to know why. Minara Agent emits three distinct streams of information: structured logs, the audit log, and tier events. Each has a purpose. This page explains what goes where and how to investigate common failure modes.
Why three streams instead of one big log? Agent misbehavior shows up in three shapes: the infrastructure broke (LLM call failed, DB locked), the agent took the wrong action (called a tool with bad args), or permission logic blocked something unexpectedly. One stream that tries to cover all three becomes unreadable. Splitting them lets you walk up to the right table immediately: behavior →
audit, permissions →tier_events, infra → structured logs. All three share atrace_id, so you can stitch a story back together when you need the full picture.
See this in use: the 6-stage safety stack
writes a row to tier_events for every fund-moving call — this
is how you prove, after the fact, that a trade passed every gate
before executing.
The three streams
| Stream | Lives in | Purpose |
|---|---|---|
| Structured logs | $dataDir/logs/*.ndjson plus stdout | Operational visibility, startup state, LLM calls |
| Audit log | SQLite audit table | "What tools did the agent call and what did they do" |
| Tier events | SQLite tier_events table | "Why was this call allowed or blocked" |
The rule of thumb: if you're debugging behavior, start with the audit log. If you're debugging infrastructure, start with structured logs. If you're debugging permissions, start with tier events.
Structured logs
apps/agent/src/core/logger.ts exposes a
minimal JSON logger:
logger.info("skills/registry", "skill_registered", { id });
logger.warn("agent/loop", "max_iterations_reached", { session_id });
logger.error("llm/anthropic-wire", "cache_miss", { reason });Each line is a JSON object with:
{
"ts": "2026-04-14T12:34:56.789Z",
"level": "info",
"category": "skills/registry",
"event": "skill_registered",
"correlation_id": "turn_abc123",
"trace_id": "t_xyz",
"data": { "id": "minara.core" }
}Correlation IDs come from withCorrelation(id, fn) wrappers around
turn execution. Every log line produced inside a turn inherits the
same id via AsyncLocalStorage, so grep correlation_id=turn_abc123 $dataDir/logs/*.ndjson gets the full story of one turn.
Log rotation
Logs are written with lettercase-based daily rotation under
$dataDir/logs/. Configure the threshold with:
LOG_LEVEL(debug/info/warn/error, defaultinfo)
Setting LOG_LEVEL=debug is safe: the structured format
means you can jq out the noise. It does roughly 4× the log
volume.
Audit log: the source of truth
Every tool call goes through auditLogHook in
apps/agent/src/core/audit-log-hook.ts.
The table schema:
CREATE TABLE audit (
id TEXT PRIMARY KEY,
session_id TEXT,
trace_id TEXT,
tool_name TEXT,
tool_set TEXT,
args_json TEXT, -- redacted
result_json TEXT,
blocked INTEGER,
block_reason TEXT,
permission_tier INTEGER,
source TEXT, -- user | cron | autopilot | delegation
duration_ms INTEGER,
created_at INTEGER
);
CREATE VIRTUAL TABLE audit_fts USING fts5(
tool_name, block_reason, args_json, content='audit'
);FTS5 means you can grep the whole history:
SELECT created_at, tool_name, blocked, block_reason
FROM audit
WHERE audit MATCH 'withdraw'
ORDER BY created_at DESC
LIMIT 50;Every investigation starts here. When a user reports "the agent did something weird," the first move is to pull their session's audit rows in order. The LLM's reasoning text, the tool arguments, the raw tool output, and the timestamps are all there.
What's redacted
The redactor in tools/_shared/result.ts
masks known sensitive keys (api_key, secret, password,
token, private_key, mnemonic, seed) with *** before
anything reaches the audit log. Combined with the rule that
secrets are never tool arguments to begin with (they come from
process.env at factory time), the audit log is safe to share
with support or drop into a bug report with minimal screening.
Tier events: why things were blocked
core/permission-tier-hook.ts
emits a row to tier_events for every decision, whether allowed or
blocked:
CREATE TABLE tier_events (
id TEXT PRIMARY KEY,
tool_name TEXT,
source TEXT,
tier INTEGER,
allow_ceiling INTEGER,
decision TEXT, -- allow | block | pending_confirmation
reason TEXT,
trace_id TEXT,
created_at INTEGER
);The audit log tells you what happened. Tier events tell you why
the permission system made that call. The two join on trace_id
and are often queried together:
SELECT a.tool_name, a.blocked, te.decision, te.reason
FROM audit a
LEFT JOIN tier_events te ON te.trace_id = a.trace_id
AND te.tool_name = a.tool_name
WHERE a.session_id = ?
ORDER BY a.created_at;Correlation and trace IDs
There are two distinct ids in play:
correlation_idis per-turn. Generated at the start of each turn by the agent loop. Appears in structured logs and audit rows.trace_idis per-workflow-run or per-signal. Propagated from aSignalContextor aWorkflowInstanceinto every tool call downstream, including sub-agent delegations.
A user chat turn usually has a fresh correlation_id and no
trace_id. A cron fire has both. trace_id lets you query "what
did the 14:03 BTC alert do across its whole lifetime, including
any sub-agents it spawned."
Common investigations
"Why did the agent refuse to call X?"
SELECT tool_name, decision, reason, created_at
FROM tier_events
WHERE tool_name = 'swap'
AND decision = 'block'
ORDER BY created_at DESC
LIMIT 10;Check reason. The most common values:
tier_exceeds_ceiling. Skill wasn't activated, or the turn'sallowRiskTieris lower than the tool's tier.analysis_to_trade_boundary. The turn already made analysis calls; trade calls must be a separate user message.daily_cap_exceeded.daily_spendplus this call's notional would exceedMINARA_DAILY_CAP_USD.kill_switch_active. Someone (or the agent) calledkill.tool_set_not_allowed. The turn'sallowedToolSetsexcluded this tool.
"Why is this turn slow?"
SELECT tool_name, AVG(duration_ms) AS avg_ms, COUNT(*) AS n
FROM audit
WHERE session_id = ?
GROUP BY tool_name
ORDER BY avg_ms DESC;Combine with structured logs grepped by correlation_id to see LLM
call durations and cache hit ratios.
"Did the prompt cache work?"
Grep the logs for llm.cache_read_input_tokens. If the value is 0
across a session, the cacheable prompt blocks aren't stable (you're
probably regenerating them per turn, which defeats caching). See
LLM Integration.
"What did autopilot do overnight?"
SELECT created_at, tool_name, blocked, substr(result_json, 1, 200)
FROM audit
WHERE source = 'autopilot'
AND created_at > ?
ORDER BY created_at;Health endpoints
The HTTP gateway exposes:
GET /healthzis a liveness probe that returns200if the process is up and the DB is openable.GET /statusreturns readiness detail: DB stats, skill count, active triggers, last-successful LLM call timestamp.
See the API reference for the exact schema.
Exporting for external tooling
If you want logs in Loki / Datadog / Grafana Cloud, pipe stdout:
docker run minara 2>&1 | vector --config vector.tomlAll stdout lines are valid NDJSON. There is no separate "structured log" export path: stdout is canonical.
Observability is boring on purpose. Three tables, one log format, one correlation field, one trace field. When something goes wrong, you read rows. When nothing goes wrong, you ignore it. That is the entire design.