The Agent Sandbox Taxonomy | georgebuilds.dev

TLDR: I published the Agent Sandbox Taxonomy, an open taxonomy and framework for scoring and comparing agent sandboxes. It breaks sandboxing into 7 defense layers, 7 threat categories, and 3 evaluation dimensions (7-7-3). Three things fell out of this:

No single product covers all seven layers. You have to stack them.

The most common setup (strong VM + open network + raw credentials) is a false sense of security.

Sandboxing and agent alignment are completely separate problems. You need both.

26 products scored so far, community contributions are very welcome.

Ask five people what “sandboxed” means and you’ll get six answers. One team means a Docker container. Another means a microVM. A third means “we added --read-only to the mount.” The word has been stretched so thin it tells you nothing about what’s actually protected, or more importantly, what isn’t.

How to use this

The taxonomy gives you a shared language for talking about agent sandboxes. Instead of “is this sandbox good?” you can ask specific questions: what layers does it cover, where are the gaps, and what do I stack on top?

Picking a sandbox? Check its fingerprint to see where it’s strong and where it’s wide open.
Comparing options? All products are scored on the same dimensions. Apples to apples.
Already using one? Find its blind spots and figure out what to layer on top. No single product covers everything.
Not sure what you need? Start with the decision checklist. It walks you through your use case before you pick anything.

7-7-3

The taxonomy breaks sandboxing into 7 defense layers, 7 threat categories, and 3 evaluation dimensions. Here’s the short version. The full README has all the details.

7 Defense Layers

L1 Compute Isolation - What separates the agent from the host?
L2 Resource Limits - Can it exhaust CPU, memory, disk?
L3 Filesystem Boundary - What can it read, write, delete?
L4 Network Boundary - What can it talk to?
L5 Credential Management - Can it see or exfiltrate secrets?
L6 Action Governance - Can it perform destructive operations?
L7 Observability & Audit - Can you see what it did?

Layers go from bottom (compute) to top (observability). A strong L1 doesn’t give you L4 for free.

7 Threats

T1 Data Exfiltration - Reads SSH keys, sends via outbound request
T2 Supply Chain Compromise - Malicious install script exfiltrates env vars
T3 Destructive Operations - rm -rf /, cloud resource deletion via API
T4 Lateral Movement - Scans local network, hits cloud metadata endpoint
T5 Persistence - Writes cron job, modifies shell init files
T6 Privilege Escalation - Exploits kernel CVE, container escape
T7 Denial of Service - Fork bomb, memory bomb, disk filling

Every threat needs multiple layers to actually stop it. The threat-layer mapping shows which layers defend against what.

3 Evaluation Dimensions

Each layer gets three scores:

Strength (S: 0-4): how hard is it to bypass? 0 = nothing, 1 = cooperative (the process can just ignore it), 2 = software-enforced, 3 = kernel-enforced (can’t undo it), 4 = structural (the attack surface straight up doesn’t exist).
Granularity (G: 0-3): how fine is the control? 0 = nothing, 1 = on/off, 2 = allowlist/blocklist, 3 = per-resource policies.
Portability: what does it run on? (linux, mac, docker, cloud, kvm, k8s).

Full scoring details in the README.

Fingerprints

Every product gets a fingerprint: S.G (strength.granularity) at each layer. It’s the quickest way to spot what a sandbox actually covers and where it falls apart:

                  L1    L2    L3    L4    L5    L6    L7
E2B              4.1   4.2   4.1   2.2   1.1    -    1.1
Claude Code      3.1    -    3.3   3.2   1.1   2.3   2.2
nono             3.1    -    3.3   3.2   3.2   3.2   3.3
Leash            2.1   0.0   2.1   2.3   1.1   2.3   2.3

E2B’s L1 is 4.1: Firecracker microVM, about as strong as it gets. But look at L5: 1.1, meaning secrets are just env vars the agent can read freely. Compare that to nono’s L5 at 3.2: kernel-enforced credential proxying with allowlist control. The gaps jump out fast. E2B’s - at L6? Zero action governance. Leash’s 0.0 at L2? No resource limits at all.

See all product score cards in the repo.

Stacking

No single sandbox covers all seven layers well. The fix is stacking: take the max score at each layer:

                  L1    L2    L3    L4    L5    L6    L7
E2B              4.1   4.2   4.1   2.2   1.1    -    1.1
+ Warden         2.1   0.0   2.2   2.3   2.3   2.2   2.2
─────────────────────────────────────────────────────────
= Composed       4.1   4.2   4.1   2.3   2.3   2.2   2.2

E2B brings the isolation box (L1-L3). Warden fills in credentials (L5), governance (L6), and observability (L7). No zeros, no dashes.

Local-first alternative:

                  L1    L2    L3    L4    L5    L6    L7
Claude Code      3.1    -    3.3   3.2   1.1   2.3   2.2
+ nono           3.1    -    3.3   3.2   3.2   3.2   3.3
─────────────────────────────────────────────────────────
= Composed       3.1    -    3.3   3.2   3.2   3.2   3.3

Only L2 uncovered. Everything else is kernel-enforced or better, zero cloud dependency. More composition patterns in the repo.

Sandboxing is not agent alignment

Sandboxes don’t make agents smarter. Prompt injection, hallucination, bad context, compromised dependencies… these are all reasons an agent might decide to do something harmful. That’s an alignment problem. Sandboxing kicks in after the bad decision. It limits what the agent can do, not what it chooses to do. You need both, and neither replaces the other.

products scored, help wanted

The initial batch covers cloud platforms, local sandboxes, policy tools, and coding agents:

E2B, Daytona, Modal, Fly.io Sprites, Blaxel, Unikraft Cloud, Docker Sandbox, Google Agent Sandbox, StrongDM Leash, Stakpak Warden, nono, packnplay, Claude Code (local), Claude Code (web), Codex CLI, Cursor, Copilot coding agent, and Devin.

Every product has per-layer scores, mechanism notes, threat breakdowns, and known gaps in products.yaml. This list will grow as new products ship and people contribute scores.

This is v1.0, and it’s marked as pending community review for a reason. If you work on an agent sandbox, if you’ve evaluated one, or if you just disagree with a score, I’d love to hear it. Open an issue, submit a PR, or just update a score.

github.com/kajogo777/the-agent-sandbox-taxonomy