Introducing ConTree: Sandboxes That Branch Like Git

Why We Built ConTree

For the past two years we have been actively researching Software Engineering agents, focusing on test-time compute scaling, reinforcement learning fine-tuning, and dataset scaling. While these techniques improve LLM quality, applying them to SWE agents remains fundamentally challenging due to the complex, stateful nature of execution environments.

SWE agents operate over mutable codebases in long-lived environments. Techniques such as Monte Carlo Tree Search require forking an environment at arbitrary points in a trajectory and executing many independent branches in parallel. Value-function estimation for reinforcement learning requires running multiple rollouts from the same intermediate state.

In practice, scaling SWE agents means managing thousands of concurrent, stateful environments. Achieving this with Kubernetes demands deep expertise, careful orchestration, and significant operational overhead. Kubernetes does not natively support container forking, which forces researchers to rely on costly environment duplication through replay or ad-hoc workarounds.

How It Works

At its core, ConTree is built around a simple idea: every operation in a container produces a new filesystem snapshot. Each snapshot is immediately addressable via an image ID and can be resumed, forked, or reused — even months later. This makes execution forkable and fully reproducible by default.

MicroVM isolation allows untrusted, agent-generated code to execute safely without risking host integrity or cross-tenant leakage, while still supporting high parallelism.

This design unlocks several powerful research workflows:

Rollback as a tool. Agents can explicitly roll back to any previous state and explore alternative actions.
Tree search over environments. Fork environments at arbitrary points and evaluate many branches in parallel.
Recovery from failures. If an agent crashes mid-trajectory, it can restart from the last snapshot.
Long-running agents. Agents may run for hours or days while compute is allocated only during actual code execution.

Trade-offs

Each operation incurs a sub-second overhead to spawn a microVM and snapshot its filesystem. ConTree is sub-optimal for scenarios where environment forking is not required. We are actively working on making snapshotting optional on a per-operation basis, allowing users to trade forkability for execution speed.

What You Get

ConTree is available as a fully managed service via a Python SDK — all you need is an API token.

Thousands of operations running in parallel.
Each user isolated in their own namespace; artifacts never shared across namespaces.
Import your own Docker images.
Over 7,000 SWE environments (SWE-bench Verified, SWE-rebench) available from a public namespace out of the box.

Early Access

ConTree has already been adopted internally by researchers at Nebius AI R&D for experiments involving MCTS, value-function estimation, and rollbacks for SWE agents. We are now releasing ConTree in early access for the broader research community.

ConTree is ideal for researchers working on long-horizon AI agents that execute code and explore various solution paths — SWE agents, Deep Research agents, and beyond.

Explore contree.dev to learn more and join the free early access program.