Self-Improving Agent

The agent reflects on past decisions, grades its own reasoning, and graduates rules that actually work

🔴 Advanced — four independent learning loops. Methodology injection and strict-playbook enforcement are default-on with DISABLE_* off switches; learning/backtest loops still require explicit opt-in via SCENARIO_LEARNING=1, BACKTEST_ENABLED=1, LEARNING_RECORD_USAGE=1. See System Design → Memory for the details.

Minara Agent doesn't just execute — it grades itself. Past decisions get reconciled with real outcomes, rules earn trust through observed correctness, and recurring question patterns surface as new analysis flows you can approve.

What you can do

Reflect on past trades — ask the agent to review last week's positions; it classifies each as logic error, missing data, exogenous event, or variance, and records actionable lessons.
See which rules earned trust — quantitative rules ("RSI > 70 → overbought") start quarantined and graduate only after ≥10 uses with ≥55% observed correctness. You can inspect which ones the agent currently trusts.
Approve new scenarios it discovered — when recurring query patterns aren't matched by builtins, the agent proposes a new scenario and asks you to approve.
Watch preferences evolve — ambient rules mined from conversation move through proposed → active only with your explicit approval.

Example prompts

reflect on my last 5 BTC trades
which methodologies has the agent learned this week?
show pending scenario proposals
approve scenario SCENARIO_LEARNED_ONCHAIN_DEEPDIVE_042
what preferences are currently pending for me?
why did the agent quarantine the "MVRV below 1 = buy" rule?

Example output

Prompt: show pending scenario proposals

Agent response:

📋 3 scenarios pending approval:

1. SCENARIO_LEARNED_ONCHAIN_DEEPDIVE_042 Clustered from 12 past misses with phrases like "on-chain flows," "whale accumulation," "exchange reserves"

Confidence (Wilson LB): 0.62

Proposed playbook (summary): Fetch exchange reserve deltas + whale wallet aging + stable outflows; cross-check with funding rates

Times used in trial mode: 14 (9 correct)

Reply /scenarios approve SCENARIO_LEARNED_ONCHAIN_DEEPDIVE_042 to graduate, or /scenarios reject <id> to dismiss.

Commands

REPL: /scenarios pending, /scenarios approve <id>, /preferences pending, /preferences approve <id>, /reflect <role-id>
CLI: minara preferences
Methodology lint: npm run lint:methodology

Safety posture

All four learning loops are off by default.
Graduation requires statistical evidence (Wilson lower bound).
Demotion is automatic (bad rules re-quarantined without human intervention).
You are always the last gate for scenario + preference graduation.

How it's built

Four independent loops with their own graduation gates — scenario learning, methodology Wilson graduation + backtest feedback, role-scoped reflection, preference mining. Each has a dedicated design doc in System Design's Memory subsection:

Learning System — tool-sequence learnings + methodology store + backtest runner.
Role Memory — per-role decision reflection, two-stage LLM classifier.
Scenario Classifier — Phase 2 self-learning pipeline (propose → bootstrap → graduate).

4a. Preference bridge (M4)

When a preference transitions into active — via the card approve flow, a manual /preferences approve, or the strong-signal auto- activation path — the preference bridge writes it through to the store that actually consumes it at prompt-build / tool-call / classifier time:

personal_style → a memories mirror tagged with metadata.layer="style" and metadata.preference_id. Style hints surface in memory snapshots and search.
hard_constraint → when the preference names a valid 11-dimension user_tags target AND its structured payload carries a valid enum value, the bridge writes the tag with source="learned_preference". That source cannot override a source="user" tag — explicit user settings always win.
behavioral_preference → scenario → the bridge records a scenario_preference_boosts entry. The classifier multiplies its keyword score by the per-scenario boost at scoring time, so a preferred scenario moves ahead of a tied competitor.
behavioral_preference → methodology → the bridge sets a preference_boost multiplier on the methodology row. retrieve() applies it in the ORDER BY so a preferred methodology floats to the top within its tier.
behavioral_preference → skill → deferred. The user_tags schema currently rejects arbitrary tag names, so skill-routing preferences can't be mirrored into tags today. M4.1 path.

Every transition OUT of active (deprecate / reject) reverses every bridge write. Bridge writes are best-effort: failures surface through the learning/preference-bridge logger and NEVER block or unwind the state transition — the preference row is the source of truth; bridge rows are a cached view.

Safety rails

For the user-facing "what it remembers about you" angle, see Memory & Personalization.

On this page