How-To

AI Agent Action Whitelist: How to Build One That Actually Works

An action whitelist is the single highest-leverage control for autonomous AI agents. Here's what it is, how to design one that doesn't break every time your agent learns a new tool, and why 'just limit the tools' isn't enough.

·5 min read
Key Takeaways
  • 01A whitelist operates on typed actions, not tool names. 'write_file' isn't one action, it's hundreds — one per path pattern.
  • 02Default-deny is the only stance that scales; anything not on the list should fail closed.
  • 03Risk tiers let you whitelist liberally for low-risk actions and tighten for dangerous ones.
  • 04The whitelist must be enforced outside the LLM — never by the LLM itself.
  • 05Every denied action should produce a receipt, not just a silent drop, so you can tune the whitelist over time.

The single highest-leverage control for an autonomous AI agent is an action whitelist: an explicit enumeration of what the agent is allowed to do, with everything else denied by default.

It sounds obvious, and most agent frameworks claim to support it (MCP tool registration, LangChain tool binding, OpenAI function calling). What those give you is a tool whitelist. A tool whitelist is not the same as an action whitelist, and the difference is where most incidents come from.

Why tool whitelisting isn't enough

When you register a tool with an LLM agent, you give it a name and a schema: write_file(path: str, content: str). The agent can now call that tool. Done, right?

Not exactly. "Write_file" isn't one action. It's an infinite family of actions parameterized by path and content. The tool is registered; the actions are not.

In production, this produces real incidents:

  • The tool is write_file, but the action the agent took was write_file("/etc/passwd", "<malicious content>").
  • The tool is send_email, but the action was send_email(to="all-staff@", subject="FYI", body="<scraped content with injected phishing link>").
  • The tool is execute_shell, but the action was execute_shell("rm -rf /").

The tool was allowed. The action should not have been.

An action whitelist operates on the fully-parameterized action, not the tool type.

The structure of a real action whitelist

A production whitelist looks more like a policy engine than a list. Each entry specifies:

  1. Action type — what family of operation (e.g. write_file, send_email).
  2. Target pattern — what targets are allowed (e.g. path: /data/agent-outputs/*).
  3. Parameter constraints — allowed values, ranges, patterns.
  4. Risk tier — what authorization level this action requires.
  5. Rate limit — how often this action may be performed.
  6. Secondary checks — additional policies that fire (e.g. egress filtering on content).

A concrete example, in a rule-engine-friendly form:

- action: write_file
  target: "path matches /data/agent-outputs/*.md"
  params:
    content:
      max_length: 50000
      must_not_match_pattern: secret_regex
  risk_tier: low
  rate_limit: 60/minute

- action: send_email
  target: "to matches *@yourcompany.com"
  params:
    body:
      max_length: 5000
      must_not_match_pattern: secret_regex
  risk_tier: medium
  rate_limit: 10/hour
  requires: slack_approval  # manual for medium tier

- action: execute_shell
  risk_tier: deny  # never, for this agent class

Anything the agent wants to do that isn't covered by a rule: denied. No rule for delete_file? Denied. No rule for post_to_public_channel? Denied. New capability appeared because a dependency update added a tool? Denied until explicitly whitelisted.

Risk tiers: why a single allow/deny doesn't scale

Most agents perform a mix of low-risk and high-risk actions. Treating them all with the same whitelist rigor produces one of two failure modes:

  • Too loose: the whitelist allows enough high-risk actions that a confused agent can do real damage.
  • Too strict: the whitelist is so tight that the agent can't do useful work and teams add exceptions until it isn't a whitelist anymore.

Risk tiers solve this. Each action is tagged with a tier; each tier has different authorization requirements:

  • Low risk (reading files, querying public APIs, generating drafts in scratch dirs) — allowed automatically, no approval.
  • Medium risk (writing to shared locations, posting internal messages, API calls to known systems) — allowed automatically but logged with receipts for audit.
  • High risk (sending external email, making payments, writing to production config, deploying code) — requires a second factor: human-in-the-loop, second agent's sign-off, or a cryptographic attestation from a higher-tier policy.
  • Forbidden (destructive commands, untrusted egress, privilege escalation) — always denied regardless of context.

This lets the whitelist be generous where the downside is small and strict where it's not.

Where the whitelist must be enforced

This is the part most implementations get wrong: the whitelist must be enforced outside the LLM.

If the LLM is the one checking the whitelist — e.g. a system prompt saying "you may only perform actions X, Y, Z" — the whitelist has the same failure modes as any other prompt instruction. Prompt injection bypasses it. Jailbreaks bypass it. Edge cases in the model's reasoning bypass it.

The whitelist has to be enforced by a component that:

  • Never asks the LLM whether the action is allowed.
  • Evaluates the action deterministically against rules.
  • Fails closed if the action doesn't match a rule.
  • Produces a signed receipt for both allows and denies.

This is a governance kernel. Sift is one. OPA (Open Policy Agent) wired to your agent's action bus is another. A hand-written rule engine is a third. What matters is not which — it's that the whitelist lives outside the thing it's protecting against.

What to do when the agent wants to do something new

New-action behavior is where whitelists reveal their quality. Bad whitelists handle "new action" by either auto-allowing (defeats the point) or hard-denying forever (agent can never grow).

Good whitelists handle it through receipts. When an agent attempts an action not on the whitelist:

  1. Deny it. The action doesn't happen.
  2. Write a receipt: "agent attempted action X against target Y, parameters Z, reason: not-in-whitelist."
  3. Aggregate these receipts into a queue for human review.
  4. A human (or a higher-tier policy) decides whether this new action should be whitelisted. If yes, the whitelist is updated. If no, the receipt stays as evidence.

Over weeks, this produces a whitelist that converges on the minimum set of actions the agent actually needs. The agent can't do anything new without it being reviewed first. Nothing gets silently approved because "it was probably fine."

The result

A properly-designed action whitelist, enforced by a deterministic kernel outside the LLM, turns an autonomous agent from a "hope it doesn't misbehave" system into a "can't misbehave in ways that matter" system. It's not the whole of agent governance — you still need authentication, egress filtering, budget caps, and receipts — but it's the single control with the highest leverage per hour of engineering.

Sift ships with this pattern built in: typed actions, risk tiers, fail-closed enforcement, signed receipts, and a review queue for denied actions. It's the layer we put in front of every agent we run.

Run your agents under Sift.

Deterministic governance. Cryptographic receipts. Fail-closed by default.