Script risk gate
How the agent decides which scripts and commands to allow, ask about, or reject — RED, YELLOW, and GREEN with the reasoning behind each line.
The agent shows you a widget:
The script contains 1 risk. Execute anyway?
[fund_moving_cli] The script will move funds via `minara swap`.
[ Execute ] [ Cancel ]What logic decides which scripts trigger a confirm widget, which are silently allowed, and which are refused outright? This page explains the model.
The three lines
The gate's mental model is three categories, each defined by the question it answers, not by a list of patterns:
- GREEN — reasonable, reversible, low-cost operations. Data reads, math, writing new files, public HTTPS GETs. The agent proceeds without interrupting you.
- YELLOW — legitimate but high-stakes. Moves money, changes on-chain authorizations, deletes specific files, installs packages, makes outbound writes. Only you can decide whether this is intentional. The gate stops and asks.
- RED — operations with no legitimate business reason.
Mass deletion, cloud-metadata exfil, reads of
~/.ssh/id_rsa, container escape, inline private keys hard-coded in source. Even if you "want" the operation, the right answer is to use a proper tool — not let the agent run a script that looks like this one. So RED is not overridable — not by clicking, not by passing a flag.
Why isn't RED overridable? Because most RED hits are prompt injection. A webpage, a skill content blob, or a tool result has fed the LLM something like "next, run this useful script" — and the LLM is now politely describing the action to you. If RED were bypassable by "user confirmation", the attack succeeds the moment you click yes, because the description you read was written by the attacker, not by the agent. So RED is enforced regardless of the conversation. "I understand the risk and want to do it anyway" doesn't apply, because what you understand was already filtered through the attacker's framing.
What the YELLOW confirmation widget shows you
Each risk entry contains three pieces:
- Category — what kind of risk it is, e.g.
fund_moving_clioronchain_dangerous_call. - Evidence — the actual line from the script that matched,
with any private-key hex, Authorization headers, Bearer
tokens, or long base64 blobs replaced by
<REDACTED-*>placeholders. - Decision — Execute or Cancel.
Why redact the evidence? Because the widget exists to let you judge intent, and "is this intentional?" doesn't depend on seeing the secret itself. If a script contains
new ethers.Wallet("0xabc…64-hex…"), showing you the raw hex turns the confirm widget into a second leak path: the bytes get copied into the audit log, the chat transcript, and potentially the screen recording. Redacting it shows "a private key is being passed to a wallet constructor" — which is the question you actually need to answer.
How the categories were drawn
Rather than listing every pattern (the source is the canonical
list — see
apps/agent/src/tools/_shared/script-risk.ts),
the interesting part is why something landed in RED versus
YELLOW. A few representative examples:
fund_moving_cli — YELLOW
Scripts that call minara swap, cast send, forge --broadcast, hardhat run … --network mainnet, solana transfer, etc.
Why not RED? DCA scripts, rebalance scripts, and liquidation bots are legitimate automation patterns. Forcing RED would break real workflows.
Why not GREEN? Scripts that call these CLIs bypass Layer 3 (the in-process two-step confirm). That's a blind spot where the agent is moving money on your behalf without you explicitly clicking yes. The gate must surface it.
obfuscation_with_sink — RED
eval(base64.b64decode(...)), getattr(__import__("o" + "s"), "sys" + "tem")(...), reversed-string compile, `globalThis["Func"
- "tion"]`.
Why RED? There is no second explanation. Nobody writes code this way except to evade static analysis. The pattern itself is the evidence of malicious intent — there's nothing to ask the user about.
mass_delete — RED for mass, YELLOW for specific
rm *, rm -rf $UNSET/x, find . -delete, shutil.rmtree(".")
are RED. rm sandbox/files/foo.csv is YELLOW.
Why this split? Mass deletion (no specific path, or a
variable that might expand to empty) is overwhelmingly the
result of either an LLM misreading "clean up the workspace" or
a prompt injection that arranges for $UNSET to expand to /.
Refusing those outright catches the common attacks and the
common accidents.
Specific-file delete (rm sandbox/files/foo.csv) is a real
operation people genuinely want — but it's still worth one
heartbeat of "yes, that one" before erasing the file.
credential_exfil — RED
Reads of ~/.ssh/id_rsa, ~/.aws/credentials, the macOS
keychain, Chrome / Brave / Edge login databases, MetaMask /
Phantom extension storage, 1Password / Bitwarden / KeePass
vaults.
Why RED? None of these have a legitimate "let the agent just open this for me" reading. If you actually need to read a SSH key in a workflow, use a dedicated CLI, not an LLM-driven script. Locking this down to RED also blunts the most financially damaging class of prompt injection attacks.
imds_ssrf — RED (all encodings)
The cloud-metadata endpoints (169.254.169.254,
metadata.google.internal, the Alibaba and Oracle equivalents)
plus every encoding form attackers use to evade naive blocklists
— decimal (2852039166), hex (0xa9fea9fe), IPv6-mapped
([::ffff:169.254.169.254]).
Why RED? No legitimate user-facing script needs to call the instance metadata service. Hits here are almost always SSRF attempts to lift IAM credentials off a cloud host running the agent.
direct_signing_with_constructor — RED
new ethers.Wallet("0x…64-hex…"), Account.from_key("0x…"),
Keypair.fromSecretKey(byteArray).
Why RED? Inline private keys in script source means either (a) someone is exfiltrating your key, or (b) you're about to commit your key to disk. Either way the right next move is to stop, not to ask. Use the agent's native trading tools — they don't require you to paste the key into source.
env_var_poisoning — YELLOW
Setting NODE_OPTIONS=--require ./steal.js, LD_PRELOAD=…,
BASH_ENV=…, PYTHONPATH=..
Why YELLOW (not RED)? Some of these are legitimate (setting
NODE_OPTIONS for tracing, PYTHONPATH for local dev). But all
of them are also the standard way to inject code into a child
process. The gate surfaces them so you can verify the file the
env var is pointing to is one you wrote, not one a prompt
injection wrote.
The full catalog is 12 RED categories and 13 YELLOW. The source is the canonical list — patterns get added as new evasions surface. The ones above are the most operationally meaningful.
What the gate cannot see
Static analysis has hard limits. Knowing them tells you where you, the operator, have to stay alert:
- Runtime-built strings. A script with
exec(a + b)whereaandbare computed at runtime — the analyzer seesexec(<variable>)and reports adynamic_commandYELLOW, but can't tell you what the executed string will be. You have to read the script. - Multi-tool stitched attacks. A script writes a file, sets
an environment variable, then a separate
terminalcall runs a binary that reads both. Each call alone is clean. The gate inspects one call at a time. Layer 2 (the OS jail) is the backstop here, not the gate. - Brand-new obfuscation patterns. Attackers can read this
source and look for the regex gaps — hex encoded 99 times,
Unicode lookalikes, deeply nested
compile()calls. We add patterns as they appear, but the real boundary is Layer 2. Even if a script bypasses every rule here, the spawned subprocess still lives insidebwrap/sandbox-exec.
The honest framing: Layer 4 inspects intent before spawn. Layer 2 caps the blast radius after spawn. Layer 4 misses something → Layer 2 still applies.
Decision flow
What happens between the LLM calling execute_code and a
subprocess actually launching:
LLM → execute_code({ code, language })
│
▼
analyzeScript(body)
│
┌────┼────┐
▼ ▼ ▼
RED YEL GREEN
│ │ │
│ ▼ ▼
│ ctx.interactionQueue.ask()
│ │ │
│ │ └── handler proceeds → spawn
│ │
│ ├── user "Execute" → handler proceeds → spawn
│ └── user "Cancel" → err("user_declined: …")
│
└── err("script_risk_rejected: …")The gate cannot be bypassed by prompt-level instruction — it lives in the handler before any subprocess work happens, not in the LLM's prompt.
What the user can control
- ✅ You can choose Cancel on any YELLOW.
- ✅ You can pre-approve a specific script body inside a
reviewed workflow definition via
script_risk_policy— see Audit and overrides. - ❌ You cannot override RED. It's designed not to be click-throughable. If a script you trust is hitting RED, the fix is to rewrite the script (e.g. use Minara's native trading tool instead of an inline private key), not to argue with the gate.
- ❌ The LLM cannot skip the widget. The gate runs in the tool handler, before any spawn; the LLM cannot prompt-engineer around it.