MINARA

Role Memory

Per-role decision reflection learning (distinct from personas and general learning)

Role memory is Minara Agent's decision reflection system. When a skill makes a prediction (a trade recommendation, a valuation call, a momentum signal), role memory records it, waits for the outcome, then runs a two-stage LLM pass to classify whether the decision was right or wrong and extract an actionable lesson if it's learnable. Each lesson is scoped to a specific role that a skill declared, so you get targeted feedback rather than generic noise.

Role vs. Skill — why split them? A skill is a capability bundle (prompt + tools + permission tier). A role is a named decision context inside a skill. One skill can declare multiple roles, each with its own outcome signal and reflection prompt. Example: analysis.valuation might own target_price (checked by price-delta after 7 days) and peer_comp (checked by relative performance vs. named peers after 30 days). Without roles, every lesson gets lumped under a single skill and becomes noise. With roles, you learn which kind of call the skill gets right and which kind it doesn't — a much more actionable signal.

See this in use: Features → Self-Improving is the user-facing side of the reflection loop. It shows how a trade recommendation turns into a lesson the agent applies next time.

role-memory diagram

Three things to disambiguate up front. Role memory is NOT:

  • Persona files (SOUL.md / AGENTS.md / IDENTITY.md from Workspace Files). Those are hand-edited identity configuration. Role memory is runtime learning.
  • Personalization (financial_profile + user_tags from Personalization). That's inferred user profile. Role memory is about the agent's decisions, unrelated to the user's profile.
  • The general learning loop (Learning System's review-engine / skill-manager / evaluation-loop). That one records successful procedures. Role memory runs parallel to it and recovers lessons from decisions with measurable outcomes (e.g. the price moved after your recommendation).

Files:

The four concepts

Role. A named decision context a skill owns. A skill can declare multiple roles. Example: analysis.valuation might own roles like analysis.valuation.target_price and analysis.valuation.peer_comp. Each role has its own failure modes, its own reflection prompt, and its own outcome signal.

Role memory entry. One row written every time a role makes a decision. Stores the inputs, the decision text, the timestamp, and an initially null reflection slot. Lives in the role_memory table.

Outcome probe. The thing that answers "was the decision right or wrong?" Three kinds:

  • price_delta: measure price change of a named asset over a window (the most common probe for trading roles).
  • custom: user-defined probe id, hooked up through the probe registry.
  • manual: outcome is never automatic, a human reviews and marks it.

Post-hoc probe. Evidence collected after the fact to feed the reflection LLM call. Two kinds:

  • refetch_source: re-runs a tool the original skill used so the reflector can check "was the answer already in the data we had at t0?"
  • news / onchain snapshots: external signals that may have moved the market between t0 and reflection time.

DomainSkill.memoryRoles[]

Skills declare roles in their DomainSkill export:

export const valuationSkill: DomainSkill = {
  id: "analysis.valuation",
  // ...
  memoryRoles: [
    {
      id: "analysis.valuation.target_price",
      label: "Target price call",
      failureModes: [
        "anchored to recent price",
        "ignored macro headwinds",
        "comp set biased toward winners",
      ],
      reflectionPrompt: `
        You previously issued a target price for {asset}.
        Window: {window}.
        Actual outcome: {pnl}.
        Evidence collected post-hoc: {evidence}.
        The Stage 1 classifier tagged this as {type}.
        Produce one actionable lesson ≤2 sentences.
      `,
      reflectionPolicy: {
        outcomeProbe: { kind: "price_delta", assetField: "asset", windowHours: 168 },
        minAgeHours: 168,   // wait at least a week before judging
        maxAgeHours: 720,   // stop after 30 days
        triggers: ["cron", "on_price_fetch"],
        cronExpr: "0 6 * * *",
        postHocProbes: [
          { kind: "refetch_source", tool: "get_price", paramMapper: "asset→symbol" },
        ],
      },
    },
  ],
};

Every field is load-bearing:

  • id is globally unique. Convention: <skill_id> for single-role skills, <skill_id>.<variant> for multi-role.
  • failureModes anchors the reflection prompt. The LLM reads them so its lessons stay grounded in realistic error categories.
  • reflectionPrompt is a template with {decision}, {window}, {pnl}, {evidence}, {type} placeholders. The reflector substitutes at reflection time.
  • reflectionPolicy.minAgeHours prevents judging a call before its window has fully elapsed.
  • reflectionPolicy.maxAgeHours prevents zombie rows from hanging around forever. Older entries are marked skipped and never reflected.
  • triggers controls when reflection fires. cron runs on the declared cronExpr. on_new_write fires as soon as a new role memory entry is written (useful for roles whose outcome is immediately measurable). on_price_fetch fires when an unrelated tool call already fetched the relevant price; it piggybacks on an existing round trip to avoid a second fetch.

The role registry

RoleRegistry.fromSkillRegistry() runs after every skill is registered. It walks every skill's memoryRoles[] and indexes them by id and by owning skill. Duplicate ids across skills cause a hard boot-time error, so you can't accidentally shadow another skill's role with a rename.

Roles are not stored in SQLite. They live only in skill source files. Deleting a skill that declared a role stops new writes to that role; existing rows stay for audit but are no longer surfaced by the registry or the reflector.

The registry is the single source of truth for:

  • Which role ids exist (drives the REPL /reflect and /recall pickers).
  • The ReflectionPolicy + prompt template for each role.
  • The skill that owns a role (for prompt-builder auto-injection of recent lessons back into that skill's prompt fragment).

The role reflector

RoleMemoryReflector is the engine. For each pending role_memory entry past its minAgeHours:

The diagram at the top of this page shows the full flow. The steps in detail:

Stage 1: classification

Shared across every role. The prompt is a single module-level constant because classification is domain-general. Four categories:

  • logic_error: the decision was wrong and the error is inside the agent's reasoning. Trainable.
  • missing_data: the decision was wrong because the agent didn't have evidence it should have fetched. Trainable.
  • exogenous: the decision was wrong because of something the agent couldn't reasonably have known (news break, news leak, policy change). Not trainable. Store the narrative but don't produce a lesson.
  • variance: the decision was within the noise band; can't distinguish skill from luck. Not trainable.

Only logic_error and missing_data go to Stage 2.

Stage 2: actionable lesson

Role-specific. The skill owns the prompt template. The reflector substitutes {decision} / {window} / {pnl} / {evidence} / {type} and calls the LLM. Output is a short lesson that gets stored on the role memory entry.

Why Stage 2 is optional: exogenous / variance decisions look similar to bad ones from the outside but don't admit actionable fixes. Forcing the LLM to produce a lesson anyway generates noise at best, hallucinated guidance at worst. Better to log the narrative and move on.

Serial locks and triggers

When on_new_write or on_price_fetch fires, multiple code paths may race to reflect on the same role memory entry. The reflector holds a per-role serial lock to prevent duplicate LLM calls. If the lock is held, the second caller drops the request and lets the in-flight one finish.

This is specifically per-role, distinct from per-entry. A analysis.valuation.target_price reflection can run in parallel with a market.spot.momentum reflection; they use different locks. But two calls into analysis.valuation.target_price at the same time will serialize.

How role memory interacts with the general learning loop

Both systems run after the agent loop completes a turn. The two answer different questions:

  • review-engine captures "how did I successfully accomplish this task" as a replayable tool_sequence. It fires at the end of any non-trivial successful turn.
  • Role memory captures "was this specific decision right or wrong, and why". It fires on a schedule or when the outcome becomes measurable, often hours or days after the decision.

Both write to SQLite. Both feed prompt content back into the agent. Neither can bypass the L3 risk gate or the permission tier hook.

Adding a role to a skill

  1. Extend your skill's DomainSkill export with a memoryRoles[] array.
  2. Pick an OutcomeProbe that matches what "right" means for your role. price_delta is the default for trading roles; use custom if you need a domain-specific signal (whale followthrough, on-chain settlement, etc.) and register the probe in apps/agent/src/memory/posthoc-probes.ts.
  3. Write a reflectionPrompt that names your role's failure modes explicitly. Generic prompts produce generic lessons.
  4. Set minAgeHours / maxAgeHours based on how long the outcome takes to mature. Intraday roles might use 4 / 48; swing roles might use 168 / 720.
  5. Pick triggers:
    • cron if the reflection schedule is predictable.
    • on_new_write if a new write invalidates old pending entries (rare; useful for position reversals).
    • on_price_fetch if an unrelated tool call already hits the right price endpoint; piggyback on it.
  6. Write a unit test that creates a role memory entry, fakes the outcome, and asserts the reflector runs Stage 1 + Stage 2.

Reference implementation: grep memoryRoles across apps/agent/src/skills/builtin/ for existing roles to crib from.

The /reflect and /recall REPL commands

From inside the REPL:

  • /reflect <role> manually fires reflection for a specific role's pending entries. Useful for debugging a role's prompt template without waiting for the cron trigger.
  • /recall <role> prints the most recent lessons for a role so you can eyeball whether the reflector is producing useful output.

Safety properties

  • Role memory cannot trigger fund-moving tool calls. The reflector is a pure classifier + LLM pass. It writes to SQLite and nothing else. A bad lesson cannot become a bad trade without going through the full permission tier hook.
  • exogenous / variance classifications produce no lesson. The agent never learns "whatever happened, make a new rule about it". Only decisions with clear error signatures produce guidance.
  • Duplicate reflection is impossible. Per-role serial locks prevent two reflector calls from updating the same entry at once.
  • Budget gated. Every reflector LLM call goes through the learning category of the budget tracker (see Learning System). A runaway reflector can't drain your daily budget without tripping BudgetExceededError first.

What NOT to use role memory for

  • Short-term conversation state. That's the sessions table.
  • User preferences. That's either ordinary memory_write or Personalization.
  • Procedural lessons. "When the user asks X, do Y" belongs in review-engine / skill-manager rather than role memory.
  • Persona configuration. Edit SOUL.md / AGENTS.md instead. See Workspace Files.

On this page