Docker shares the host kernel. That was always the trade. It was fine when a human read the script before it ran. It stopped being fine the second an LLM started writing code at runtime off a prompt nobody pre-screened. So here's the practitioner version: what to actually run when your agent executes code you've never seen.
The thing that broke
The review step is gone. A model writes a script, the script lives for milliseconds, then it executes. Could be a clean chart. Could be a curl-pipe-shell because a prompt injection rewired intent four hops upstream. You don't get to read it first.
And the container under it shares one kernel with every other workload on the box. CVE-2024-1086, a netfilter use-after-free, owns every container on the host once it pops. CISA confirmed active ransomware exploitation in late 2025, years after the patch. November 2025 dropped three more under runC (CVE-2025-31133, CVE-2025-52565, CVE-2025-52881), all bypassing maskedPaths through symlink races to write procfs gadgets. Own core_pattern and the kernel runs your binary on the next coredump, as root.
In March 2026, Oxford and the UK AISI shipped SandboxEscapeBench. Frontier models reliably escaped privileged containers, writable host mounts, and exposed Docker daemons on their own. Cost per attempt: roughly a dollar. The model does the recon, picks the CVE, hands back the shell. So the fix isn't a better Docker config. It's a different boundary.








