Sandboxes Explained - The Engine of Agentic Work
Why ephemeral, secure environments are the secret weapon for AI agents and how they enable a new way of working.
In the traditional software development lifecycle, "environment setup" is often the most painful step. "It works on my machine" is a meme for a reason. But when we move to an AI-native workflow, where agents are writing and executing code, we can't afford flaky environments.
This is where Sandboxes come in. They are the invisible engine that powers OutcomeDev, and understanding them unlocks a new way of thinking about software creation.
What is a Sandbox?
A sandbox in OutcomeDev is a micro-virtual machine (MicroVM) that spins up in milliseconds. It is a fully isolated Linux environment with Node.js, Python, and other tools pre-installed.
But unlike your laptop, a sandbox is ephemeral. It is born to perform a task, and it dies when the task is done (or times out).
The "Fresh Clone" Philosophy
The most critical concept to understand is that state does not live in the sandbox. State lives in GitHub.
Every time you start a task or restart a sandbox, the following happens:
- A fresh MicroVM is created (pristine state).
- Your repository is cloned from GitHub.
- The specific branch for your task is checked out.
- Dependencies are installed.
This guarantees that your environment is always clean. There are no lingering configuration files from a previous experiment. If it works in the sandbox, it works in production.
Why Agents Love Sandboxes
AI Agents thrive on predictability.
- Safety: An agent can run
rm -rf /in a sandbox, and it doesn't matter. The VM is destroyed anyway. This allows agents to be bold and experimental. - Isolation: One task cannot affect another. You can have an agent upgrading a database in one task and another agent fixing a frontend bug in another task, running in parallel, with zero conflict.
- Reproducibility: Since every run starts from a fresh clone, we eliminate the "works on my machine" problem entirely.
The Lifecycle: Intent -> Execution -> Persistence
This new workflow changes how we work:
- Intent: You define what you want ("Fix the login bug").
- Execution (Sandbox): The agent spins up a sandbox, reproduces the issue, fixes it, and verifies it.
- Persistence (GitHub): The agent commits and pushes the code to GitHub.
- Termination: The sandbox dies.
If the sandbox crashes or times out before step 3, the work is lost. This sounds scary, but it enforces a discipline of frequent commits. It treats code changes as the only source of truth.
Reusing Tasks
A common misconception is that once a sandbox dies, the task is "over." False.
Because the state lives in GitHub, you can restart a task anytime. Clicking "Start Sandbox" simply spins up a new VM and clones the latest state from GitHub. You can pick up exactly where you left off, days or weeks later.
Conclusion
Sandboxes are not just a technical detail; they are a paradigm shift. They allow us to treat compute as a disposable commodity, while treating code (in git) as the permanent asset. This is what enables agents to work autonomously, safely, and effectively.