Case Study

How Stanford’s CS Department Accelerates Open-Source Agent Development With Daytona

3x

faster sandbox provisioning with Daytona

1M+

sandboxes running simultaneously every month with Daytona

4

months saved on sandbox infrastructure engineering

At Stanford’s Computer Science Department, Ludwig Schmidt—an ICML and NeurIPS award winner—leads a research group focused on democratizing machine learning. Their latest project, DataComp, is an open-source benchmark for improving language models through data curation.

Headquarters

Palo Alto, CA

Industry

Higher Education Academic Research

Department

AI Research Computer Science Data Science

Key Features

Sandbox Creation Speed Sandbox Statefulness Long‑Running Sandboxes

cs.stanford.edu

Learn how this top university’s Computer Science department partnered with Daytona to provision stateful, highly customized sandboxes at unprecedented scale, eliminate infrastructure overhead, and accelerate open-source agentic model development.

“Our team is building open-source AI agents because they can benefit the whole world. We use Daytona daily across every stage of our ML pipeline. Their managed runtime has been integral to our project.”

Etash Guha

PhD Researcher at Stanford University

01 -- CHALLENGE

Sandbox Infrastructure Constraints Threatened Critical ML Research

One of the many ways Etash Guha, PhD Researcher at Stanford, contributes to Ludwig Schmidt’s DataComp is by training language models to autonomously solve real-world problems. For this goal, he and the research team conduct thousands of controlled experiments throughout the ML development cycle to evaluate performance across diverse tasks—from repetitive workflow automation to debugging production code and quantitative analysis. However, securing scalable sandbox infrastructure to support these efforts came with significant time and resource constraints.

To train agents against realistic environments and problem-solving scenarios, each sandbox requires tailored system configurations, dependencies, and software tools. These environments also need to spin up fast to keep the ML pipeline on track and run for up to 3 hours, depending on the test.

Etash had estimated that building an infrastructure solution to meet these demands would take at least four months and incur substantial engineering costs. Even if the team could absorb the time drain, he knew that in-house builds lacked the refinement and technical depth to match specialized infrastructure performance. The resulting latency would leave GPUs idle during sandbox creation, wasting valuable cycles.

Beyond sandbox requirements, the team had to account for ongoing maintenance. Monitoring uptime, automating cleanup, and updating the infrastructure would divert focus from core model development to operational tasks.

To address these hurdles, Etash and his research team explored several sandbox provisioning platforms, but most imposed customization constraints and limited capacity.

That’s when he discovered Daytona. Their agent-first infrastructure unlocked the scale and flexibility that his research demanded.

“Simulating a model interacting in different environments is extremely difficult. To continue our research, we need an infrastructure partner who could provide flexible sandboxes at scale. No one met our needs except Daytona.”

Etash Guha

PhD Researcher at Stanford University

02 -- SOLUTION

A Scalable Runtime Platform That Delivers Fast, Flexible Sandbox Infrastructure

Etash and his research team seamlessly integrated Daytona across their ML development cycle. The runtime automates the entire sandbox lifecycle—from spin-up to teardown—so they now provision thousands of concurrent, long-running sandboxes with minimal engineering overhead.

Beyond this impressive scalability, Daytona also delivers the flexibility the research team required to advance their DataComp agent training. With Daytona, they now easily customize every sandbox, specifying key elements like resources and environment variables. This capability ensures accurate simulation of real-world conditions, directly improving agents’ problem-solving abilities.

To support quicker iterations, Daytona's Declarative Image Builder provides a code-first approach to defining custom environments. Etash and his team can programmatically specify dependencies, base images, and configurations through the SDK, building tailored Docker images on the fly without touching a container registry. Images are then automatically cached for 24 hours, making subsequent sandbox creations nearly instantaneous.

These simple SDK calls allow Etash and his team to swiftly test new configurations, enabling broader experimentation that strengthens agent performance in diverse scenarios. Each sandbox also supports Computer Use and is stateful by design. These capabilities mean agents interact with software tools, manipulate files, and write code just as human developers would, all without losing context.

Because the platform spins up sandboxes in less than 100 ms, latency remains low, protecting GPU resources and ensuring compute time is spent on training, not infrastructure operations. Plus, experiments happen faster, driving quicker iterations and accelerating the overall pipeline.

While these workflows run smoothly without any human intervention, Daytona’s Dashboard gives Etash a real-time overview of sandbox performance, flagging any instances of downtime for quick investigation. These alerts give him peace of mind to fully focus on their project goals.

Beyond the technical capabilities, Daytona's responsive engineering support has proven equally valuable. When the team encounters edge cases or needs platform adjustments, they receive same-day responses through a dedicated Slack channel, ensuring research momentum never stalls.

"Daytona’s SDK lets us build highly custom environments with low latency, which is extremely important for training our agents to solve problems in diverse real-world scenarios.”

Etash Guha

PhD Researcher at Stanford University

03 -- RESULT

Stanford’s CS Department Provisions Thousands of Sandboxes 3x Faster with Daytona

With Daytona, Etash and his research team now have a flexible and scalable sandbox infrastructure that accelerates agentic training experiments without the operational overhead. With this foundation, they can continue their groundbreaking work of building models that make open-source agentic AI more accessible.

3x faster sandbox provisioning with Daytona
4 months saved on sandbox infrastructure engineering

Looking ahead, Etash plans to expand his use of Daytona as the DataComp agent project scales over the next year. The runtime will remain the critical infrastructure foundation supporting increasingly larger and more complex experiments.