For the last few months, I’ve been tinkering with building a personal AI assistant (think OpenClaw). One thing quickly became clear: agents get much more useful when they can run real commands like install packages, manipulate data, or browse the interweb.

The problem is that letting an agent run arbitrary commands directly on it’s host is less than ideal, so I wanted a sandbox environment where they could run pretty much anything, while reducing the attack surface.

That became mcp-sandboxd: an MCP server that exposes a few tools - one of which to run commands inside an isolated environment, such as a Docker container or a Kubernetes pod.

The core idea is simple: each identifier (in my case, a conversation id) maps to a long-running sandbox. So instead of creating a new environment for every command, the agent keeps working inside the same one. That means it can install dependencies once, modify files, and build up state across multiple calls.

For example, a simple toolcall to run_sandbox call might look like this:

{
  "identifier": "conversation-123", # The identifier for the sandbox for reuse across multiple calls
  "commands": [
    { "argv": ["apt-get", "update"] },
    { "argv": ["apt-get", "install", "-y", "playwright"] },
    { "argv": ["playwright", "screenshot", "https://johan.eliasson.xyz", "/artifacts/screenshot.png"] }
  ],
  "options": {
    "as_user": "root"
  }
}

It also supports artifacts: written files to /artifacts inside a sandbox can be fetched over HTTP via the MCP server which act as a proxy.

This little server has covered my use-case for my assistant so far. I don’t expect it to grow in features and rather keep it small and focused on doing this one thing well.

If you find it useful, maybe give it a star! ⭐