Case Study

How Vmax Accelerates Open-Ended AI Learning With Daytona

1,000+

parallel sandboxes ready to start in one minute

30

days of engineering time saved

Vmax is an AI research and development company building open-ended learning systems that let agents improve continuously, without human intervention. Their goal: AI agents that surpass human-defined benchmarks and expert-curated data.

Headquarters

San Francisco, CA

Industry

AI Research Reinforcement Learning

Department

Research and Development

Key Features

Sandbox Creation Speed Infrastructure Scale Isolation and Security

vmax.ai

Learn how this AI research and development company partnered with Daytona to provision thousands of concurrent sandboxes where agents define and optimize their own tasks.

We want to provide infrastructure that enables the open-ended learning of AI agents. That requires agents to generate and validate their own tasks. Without Daytona, we wouldn't be able to validate these tasks at the scale and speed our pipeline demands.

Augustine Mavor-Parker

Co-founder and CTO at Vmax

01 -- CHALLENGE

Advancing Open-Ended Learning Research Demanded Scalable Sandbox Infrastructure

When Augustine Mavor-Parker co-founded Vmax, he had one clear goal: enabling AI agents to learn autonomously and compound their capabilities through programmatic performance evaluation. This vision required training agents to generate, execute, and optimize their own tasks, a process that demands robust sandbox infrastructure to run reliably at scale.

At Vmax’s development velocity, concurrent provisioning of isolated and controlled sandboxes was non-negotiable. Their models generated tasks far faster than their system’s evaluation capacity, so generating one sandbox at a time would leave agents waiting for feedback and halt Vmax’s research. Because per-sandbox delays could multiply into significant bottlenecks at that scale, startup speed also had to be high to maximize throughput.

To ensure accurate evaluation, each sandbox also had to include the dependencies and tools required by the task. Many of Vmax's tasks focus on software engineering, like learning to add new features to repositories. Custom development environments would ensure that outcomes accurately reflect task difficulty, advancing open-ended learning.

Augustine knew that building sandbox infrastructure in-house to meet these needs would require countless hours of spinning up fleets of cloud instances, managing uptime, and tuning startup times. Rather than pulling his lean research team away from core experiments, he set out to find a dedicated runtime platform that would provide the infrastructure and support to scale with Vmax's task pipeline.

Fortunately, Augustine didn’t have to search for long. After the AI research and development community pointed him to Daytona, he immediately knew that he’d found the solution to his dilemma.

We're working to build systems that can generate, validate, and optimize on their own objectives. That's only possible with the infrastructure to run a massive number of concurrent sandboxes.

Augustine Mavor-Parker

Co-founder and CTO at Vmax

02 -- SOLUTION

A Runtime Platform That Provisions Hundreds of Parallel Sandboxes on Demand

Onboarding Daytona was fast and high-touch. After creating a shared Slack channel, Augustine worked directly with the Daytona team to establish platform familiarity, review documentation, and relay any pressing questions before and during the rollout. "We fed their LLMs.txt file into Claude, and it gave us instant self-serve guidance that minimized the learning curve," says Augustine. “And if our team had additional questions or needed higher sandbox limits, Daytona was always quick to respond, even during nights and weekends.”

Now, when an agent picks up a task, Daytona instantly provisions a dedicated sandbox. This flow happens simultaneously for every run in the queue, creating a pool of concurrent task evaluations. To further increase throughput, Augustine also pre-builds Snapshots for common tasks, so sandboxes start from a warm, pre‑configured image.

This process runs with airtight isolation by default. Each sandbox has a dedicated kernel, filesystem, and network stack, eliminating the risk of network bleed or process interference. When a task is completed, Daytona tears it down automatically so operational overhead never accumulates across runs.

Through Daytona’s fast, elastic sandboxes, Vmax keeps its pipeline moving continuously, producing validated task evaluations that compound agent capabilities. In the meantime, Vmax’s team remains focused on their core research without the overhead of building and maintaining sandboxes.

We have Daytona’s sandboxes on demand all the time. We can scale experiments as soon as we have an idea instead of waiting on infrastructure to catch up.

Augustine Mavor-Parker

Co-founder and CTO at Vmax

03 -- RESULT

Vmax Unlocks Capacity for Generating 1,000+ Sandboxes Per Minute With Daytona

With Daytona, Vmax can now produce and validate self-generated agent tasks at scale. As their sandbox infrastructure spins up, isolates, and tears down environments on demand, Augustine and his team focus on advancing frontier techniques for open-ended learning.

1,000+ parallel sandboxes ready to start in one minute
30 days of engineering time saved

Looking ahead, Augustine is excited to see how Daytona’s sandbox forking abilities can enable his team to run parallel agent explorations from the same state and develop new ways of combining test-time search and LLM-based reinforcement learning. This approach will help Vmax run more experiments without the overhead of reprovisioning, compounding their research gains.