Security

How Minara prevents the agent from doing something you'd regret — layered defense, threat model, and what each layer protects against.

You tell the agent: "run this Python script and show me the output." It runs. The script might be harmless data work — or its last line might quietly call minara swap and send 100 USDC to an attacker contract.

Letting an agent that can move money run code, type commands, and write files on your machine is genuinely dangerous. This chapter is about exactly what Minara does about it, why each defense exists, and where the gaps still are.

Three scenarios that motivate everything below

Concrete attacks make the design make sense:

Paste attack. A user copies a "useful" script from Discord or a forum, asks the agent to run it, and somewhere in the middle a minara swap is buried.
Multi-step injection. The agent writes a harmless helper file, then writes one more line appending to it, then runs python helper.py. Each step looks clean. Together they drain the wallet.
On-chain trap. A script calls approve(unknown_address, MaxUint256) or signs a Permit2 message. These aren't "obviously malicious" commands — they're the same calls a legitimate DEX integration uses, except the spender field is the attacker.

None of those are caught by a single check. The script runs, the files exist, the contract calls are valid. What stops them is the combination of independent layers below.

Layered defense — four independent gates

Minara runs four independent safety layers in front of every LLM-reachable tool. They are independent on purpose: an attacker who beats one still has to beat the others, and the layers fail in different ways.

four-layer security stack: command-guard tripwire, script-risk gate, fund-moving confirm, OS jail boundary

The coral focal layer (OS jail) is the only physical boundary: even if everything above it fails, syscall-level isolation still applies. The other three are tripwires — they add visibility and a chance to interrupt, but a determined attacker can read the source and look for gaps. Defense in depth means the combination is meaningful, not any single layer in isolation.

Layer 1 — command-guard tripwire

Regex denylist on every shell command. Hard-blocks the no-question- asked patterns: rm -rf /, sudo, mkfs, curl … | sh, fork bombs. The point isn't that the regex catches everything (it can't — base64-then-eval slips past). The point is that obvious accidents and the most common prompt-injection one-liners die loudly here, leaving the rest for higher layers.

This is a tripwire, not a boundary.

Layer 2 — OS jail (the actual boundary)

Every subprocess spawned by execute_code or terminal is wrapped in bwrap on Linux or sandbox-exec on macOS. Inside that wrapper the process cannot read ~/.ssh, cannot connect to 169.254.169.254 (cloud metadata), cannot write outside the workspace directory.

Even if every higher layer is bypassed — wrong regex, prompt injection, sneaky obfuscation — the attack code still has to live inside the syscall set the jail allows. This is what we mean by "physical boundary": it's enforced by the kernel, not by string matching.

Layer 3 — fund-moving confirm (two-step)

Every tool that moves money — swap, buy, sell, transfer, perps, autopilot enable, workflow activate — must be called twice:

First call (no confirm parameter): the handler simulates the trade and returns the preview (amounts, route, slippage, gas estimate). It does not broadcast.
Second call (confirm: true): only now does the handler actually sign and broadcast.

This is enforced inside every handler, not by a polite prompt to the LLM. The LLM cannot "forget" the rule — calling swap_tokens without confirm: true simply returns a preview, never a tx hash.

Full details in Fund-moving confirm.

Layer 4 — script-risk gate (new)

Layer 3 catches direct fund-moving tool calls, but what about a Python script that runs subprocess.run(["minara", "swap", …])? The shell-out bypasses the in-process tool registry — Layer 3 never sees it.

Layer 4 fills that gap. Before execute_code, terminal, write_file, or patch runs, the script body is statically analyzed:

RED — auto-reject (mass deletion, IMDS / SSRF, credential exfil, container escape, indirect obfuscation + sink, inline private-key signing).
YELLOW — pause and ask the user (fund-moving CLI shell-out, approve / Permit2 / Safe owner change, env-var poisoning, specific-path delete, risky package install).
GREEN — proceed silently (everything else).

Full details in How the agent decides risk.

What Minara does not protect you from

The honest list. None of these are bugs — they're scope boundaries. Knowing them is how you stay alert in the right places.

Runtime-built payloads. A script that runs os.system(a + b) where a and b are computed at runtime — static analysis can't see what the joined string will be. The Layer 2 jail still caps what the subprocess can do, but the gate will flag dynamic_command as a YELLOW and ask you. Read the script before saying yes.
Malicious smart contracts and DApps. The code layer cannot know that contract 0xabc… is a drainer or that cool-dapp.example.com is a phishing clone. That's the parallel job of token / DApp scanning.
Transactions you signed. Layer 3 shows you the preview. If you confirm it, the bytes are signed and broadcast. Read the preview before clicking yes.
Your own typos and bugs. Minara guards against malicious intent, not against you accidentally deleting the wrong file or sending to the wrong address. The two-step confirm exists so you catch your own mistakes too — use it that way.

What this means in practice

A practical safety calibration for daily use:

🟢 Routine and unsurprising. Pandas data work, public HTTPS GET requests, generating charts from CSVs, npm ci --ignore-scripts — Minara runs without interrupting you.
🟡 You'll see a confirmation widget (YELLOW). Scripts that call minara swap, cast send, or forge --broadcast; ERC-20 approve and Permit2 signatures; specific-path deletes; process substitution like bash <(curl …); setting NODE_OPTIONS or LD_PRELOAD. Read the evidence the gate surfaces, then accept or cancel.
🔴 Hard-rejected (RED). Mass deletion (rm -rf *, rm -rf $UNSET/x), reads of ~/.ssh/id_rsa, IMDS endpoint access, container escape attempts, inline private keys in code. These patterns have no legitimate business reason — the gate refuses even with confirm: true. If you hit RED, don't try to bypass it. Look at what the script wanted to do — it's almost always prompt injection or a copy-pasted attack.
⚠️ Your job. Keep private keys out of the repo. chmod 600 on .env. Never set MINARA_SKIP_FUND_CONFIRM or DISABLE_SCRIPT_RISK_GATE in an interactive session. Read Layer 3 previews before confirming.

Going deeper

This chapter is operator-focused — "how to use Minara safely." For the formal trust model, threat taxonomy, and vulnerability disclosure policy, see the repo's SECURITY.md. That document is written for security researchers; what you're reading is written for people running Minara to trade.

Continue to:

Fund-moving confirm — the two-step flow for every money-moving tool.
Script risk gate — how the agent decides RED vs YELLOW vs GREEN.
Audit and overrides — reviewing decisions, granting workflow exemptions, the kill switch.

On this page