Script risk gate

How the agent decides which scripts and commands to allow, ask about, or reject — RED, YELLOW, and GREEN with the reasoning behind each line.

The agent shows you a widget:

The script contains 1 risk. Execute anyway?

[fund_moving_cli] The script will move funds via `minara swap`.
[ Execute ] [ Cancel ]

What logic decides which scripts trigger a confirm widget, which are silently allowed, and which are refused outright? This page explains the model.

The three lines

The gate's mental model is three categories, each defined by the question it answers, not by a list of patterns:

GREEN — reasonable, reversible, low-cost operations. Data reads, math, writing new files, public HTTPS GETs. The agent proceeds without interrupting you.
YELLOW — legitimate but high-stakes. Moves money, changes on-chain authorizations, deletes specific files, installs packages, makes outbound writes. Only you can decide whether this is intentional. The gate stops and asks.
RED — operations with no legitimate business reason. Mass deletion, cloud-metadata exfil, reads of ~/.ssh/id_rsa, container escape, inline private keys hard-coded in source. Even if you "want" the operation, the right answer is to use a proper tool — not let the agent run a script that looks like this one. So RED is not overridable — not by clicking, not by passing a flag.

Why isn't RED overridable? Because most RED hits are prompt injection. A webpage, a skill content blob, or a tool result has fed the LLM something like "next, run this useful script" — and the LLM is now politely describing the action to you. If RED were bypassable by "user confirmation", the attack succeeds the moment you click yes, because the description you read was written by the attacker, not by the agent. So RED is enforced regardless of the conversation. "I understand the risk and want to do it anyway" doesn't apply, because what you understand was already filtered through the attacker's framing.

Each risk entry contains three pieces:

Category — what kind of risk it is, e.g. fund_moving_cli or onchain_dangerous_call.
Evidence — the actual line from the script that matched, with any private-key hex, Authorization headers, Bearer tokens, or long base64 blobs replaced by <REDACTED-*> placeholders.
Decision — Execute or Cancel.

Why redact the evidence? Because the widget exists to let you judge intent, and "is this intentional?" doesn't depend on seeing the secret itself. If a script contains new ethers.Wallet("0xabc…64-hex…"), showing you the raw hex turns the confirm widget into a second leak path: the bytes get copied into the audit log, the chat transcript, and potentially the screen recording. Redacting it shows "a private key is being passed to a wallet constructor" — which is the question you actually need to answer.

Rather than listing every pattern (the source is the canonical list — see apps/agent/src/tools/_shared/script-risk.ts), the interesting part is why something landed in RED versus YELLOW. A few representative examples:

`fund_moving_cli` — YELLOW

Scripts that call minara swap, cast send, forge --broadcast, hardhat run … --network mainnet, solana transfer, etc.

Why not RED? DCA scripts, rebalance scripts, and liquidation bots are legitimate automation patterns. Forcing RED would break real workflows.

Why not GREEN? Scripts that call these CLIs bypass Layer 3 (the in-process two-step confirm). That's a blind spot where the agent is moving money on your behalf without you explicitly clicking yes. The gate must surface it.

`obfuscation_with_sink` — RED

eval(base64.b64decode(...)), getattr(__import__("o" + "s"), "sys" + "tem")(...), reversed-string compile, `globalThis["Func"

"tion"]`.

Why RED? There is no second explanation. Nobody writes code this way except to evade static analysis. The pattern itself is the evidence of malicious intent — there's nothing to ask the user about.

`mass_delete` — RED for mass, YELLOW for specific

rm *, rm -rf $UNSET/x, find . -delete, shutil.rmtree(".") are RED. rm sandbox/files/foo.csv is YELLOW.

Why this split? Mass deletion (no specific path, or a variable that might expand to empty) is overwhelmingly the result of either an LLM misreading "clean up the workspace" or a prompt injection that arranges for $UNSET to expand to /. Refusing those outright catches the common attacks and the common accidents.

Specific-file delete (rm sandbox/files/foo.csv) is a real operation people genuinely want — but it's still worth one heartbeat of "yes, that one" before erasing the file.

`credential_exfil` — RED

Reads of ~/.ssh/id_rsa, ~/.aws/credentials, the macOS keychain, Chrome / Brave / Edge login databases, MetaMask / Phantom extension storage, 1Password / Bitwarden / KeePass vaults.

Why RED? None of these have a legitimate "let the agent just open this for me" reading. If you actually need to read a SSH key in a workflow, use a dedicated CLI, not an LLM-driven script. Locking this down to RED also blunts the most financially damaging class of prompt injection attacks.

`imds_ssrf` — RED (all encodings)

The cloud-metadata endpoints (169.254.169.254, metadata.google.internal, the Alibaba and Oracle equivalents) plus every encoding form attackers use to evade naive blocklists — decimal (2852039166), hex (0xa9fea9fe), IPv6-mapped ([::ffff:169.254.169.254]).

Why RED? No legitimate user-facing script needs to call the instance metadata service. Hits here are almost always SSRF attempts to lift IAM credentials off a cloud host running the agent.

`direct_signing_with_constructor` — RED

new ethers.Wallet("0x…64-hex…"), Account.from_key("0x…"), Keypair.fromSecretKey(byteArray).

Why RED? Inline private keys in script source means either (a) someone is exfiltrating your key, or (b) you're about to commit your key to disk. Either way the right next move is to stop, not to ask. Use the agent's native trading tools — they don't require you to paste the key into source.

`env_var_poisoning` — YELLOW

Setting NODE_OPTIONS=--require ./steal.js, LD_PRELOAD=…, BASH_ENV=…, PYTHONPATH=..

Why YELLOW (not RED)? Some of these are legitimate (setting NODE_OPTIONS for tracing, PYTHONPATH for local dev). But all of them are also the standard way to inject code into a child process. The gate surfaces them so you can verify the file the env var is pointing to is one you wrote, not one a prompt injection wrote.

The full catalog is 12 RED categories and 13 YELLOW. The source is the canonical list — patterns get added as new evasions surface. The ones above are the most operationally meaningful.

What the gate cannot see

Static analysis has hard limits. Knowing them tells you where you, the operator, have to stay alert:

Runtime-built strings. A script with exec(a + b) where a and b are computed at runtime — the analyzer sees exec(<variable>) and reports a dynamic_command YELLOW, but can't tell you what the executed string will be. You have to read the script.
Multi-tool stitched attacks. A script writes a file, sets an environment variable, then a separate terminal call runs a binary that reads both. Each call alone is clean. The gate inspects one call at a time. Layer 2 (the OS jail) is the backstop here, not the gate.
Brand-new obfuscation patterns. Attackers can read this source and look for the regex gaps — hex encoded 99 times, Unicode lookalikes, deeply nested compile() calls. We add patterns as they appear, but the real boundary is Layer 2. Even if a script bypasses every rule here, the spawned subprocess still lives inside bwrap / sandbox-exec.

The honest framing: Layer 4 inspects intent before spawn. Layer 2 caps the blast radius after spawn. Layer 4 misses something → Layer 2 still applies.

Decision flow

What happens between the LLM calling execute_code and a subprocess actually launching:

LLM → execute_code({ code, language })
        │
        ▼
   analyzeScript(body)
        │
   ┌────┼────┐
   ▼    ▼    ▼
  RED  YEL  GREEN
   │    │    │
   │    ▼    ▼
   │  ctx.interactionQueue.ask()
   │    │    │
   │    │    └── handler proceeds → spawn
   │    │
   │    ├── user "Execute"  → handler proceeds → spawn
   │    └── user "Cancel"   → err("user_declined: …")
   │
   └── err("script_risk_rejected: …")

The gate cannot be bypassed by prompt-level instruction — it lives in the handler before any subprocess work happens, not in the LLM's prompt.

What the user can control

✅ You can choose Cancel on any YELLOW.
✅ You can pre-approve a specific script body inside a reviewed workflow definition via script_risk_policy — see Audit and overrides.
❌ You cannot override RED. It's designed not to be click-throughable. If a script you trust is hitting RED, the fix is to rewrite the script (e.g. use Minara's native trading tool instead of an inline private key), not to argue with the gate.
❌ The LLM cannot skip the widget. The gate runs in the tool handler, before any spawn; the LLM cannot prompt-engineer around it.