Self-Improving Agent
The agent reflects on past decisions, grades its own reasoning, and graduates rules that actually work
🔴 Advanced — four independent learning loops. Methodology injection and strict-playbook enforcement are default-on with
DISABLE_*off switches; learning/backtest loops still require explicit opt-in viaSCENARIO_LEARNING=1,BACKTEST_ENABLED=1,LEARNING_RECORD_USAGE=1. See System Design → Memory for the details.
Minara Agent doesn't just execute — it grades itself. Past decisions get reconciled with real outcomes, rules earn trust through observed correctness, and recurring question patterns surface as new analysis flows you can approve.
What you can do
- Reflect on past trades — ask the agent to review last week's positions; it classifies each as logic error, missing data, exogenous event, or variance, and records actionable lessons.
- See which rules earned trust — quantitative rules ("RSI > 70 → overbought") start quarantined and graduate only after ≥10 uses with ≥55% observed correctness. You can inspect which ones the agent currently trusts.
- Approve new scenarios it discovered — when recurring query patterns aren't matched by builtins, the agent proposes a new scenario and asks you to approve.
- Watch preferences evolve — ambient rules mined from
conversation move through
proposed → activeonly with your explicit approval.
Example prompts
reflect on my last 5 BTC trades
which methodologies has the agent learned this week?
show pending scenario proposals
approve scenario SCENARIO_LEARNED_ONCHAIN_DEEPDIVE_042
what preferences are currently pending for me?
why did the agent quarantine the "MVRV below 1 = buy" rule?Example output
Prompt: show pending scenario proposals
Agent response:
📋 3 scenarios pending approval:
1. SCENARIO_LEARNED_ONCHAIN_DEEPDIVE_042 Clustered from 12 past misses with phrases like "on-chain flows," "whale accumulation," "exchange reserves"
- Confidence (Wilson LB): 0.62
- Proposed playbook (summary): Fetch exchange reserve deltas + whale wallet aging + stable outflows; cross-check with funding rates
- Times used in trial mode: 14 (9 correct)
Reply
/scenarios approve SCENARIO_LEARNED_ONCHAIN_DEEPDIVE_042to graduate, or/scenarios reject <id>to dismiss.
Commands
- REPL:
/scenarios pending,/scenarios approve <id>,/preferences pending,/preferences approve <id>,/reflect <role-id> - CLI:
minara preferences - Methodology lint:
npm run lint:methodology
Safety posture
- All four learning loops are off by default.
- Graduation requires statistical evidence (Wilson lower bound).
- Demotion is automatic (bad rules re-quarantined without human intervention).
- You are always the last gate for scenario + preference graduation.
How it's built
Four independent loops with their own graduation gates — scenario learning, methodology Wilson graduation + backtest feedback, role-scoped reflection, preference mining. Each has a dedicated design doc in System Design's Memory subsection:
- Learning System — tool-sequence learnings + methodology store + backtest runner.
- Role Memory — per-role decision reflection, two-stage LLM classifier.
- Scenario Classifier — Phase 2 self-learning pipeline (propose → bootstrap → graduate).
4a. Preference bridge (M4)
When a preference transitions into active — via the card approve
flow, a manual /preferences approve, or the strong-signal auto-
activation path — the preference bridge writes it through to the
store that actually consumes it at prompt-build / tool-call /
classifier time:
personal_style→ amemoriesmirror tagged withmetadata.layer="style"andmetadata.preference_id. Style hints surface in memory snapshots and search.hard_constraint→ when the preference names a valid 11-dimensionuser_tagstarget AND its structured payload carries a valid enum value, the bridge writes the tag withsource="learned_preference". That source cannot override asource="user"tag — explicit user settings always win.behavioral_preference → scenario→ the bridge records ascenario_preference_boostsentry. The classifier multiplies its keyword score by the per-scenario boost at scoring time, so a preferred scenario moves ahead of a tied competitor.behavioral_preference → methodology→ the bridge sets apreference_boostmultiplier on the methodology row.retrieve()applies it in the ORDER BY so a preferred methodology floats to the top within its tier.behavioral_preference → skill→ deferred. Theuser_tagsschema currently rejects arbitrary tag names, so skill-routing preferences can't be mirrored into tags today. M4.1 path.
Every transition OUT of active (deprecate / reject) reverses every
bridge write. Bridge writes are best-effort: failures surface
through the learning/preference-bridge logger and NEVER block or
unwind the state transition — the preference row is the source of
truth; bridge rows are a cached view.
Safety rails
For the user-facing "what it remembers about you" angle, see Memory & Personalization.