Case Study

How Trajectory Generates Agent Training Sandboxes in Seconds With Daytona

99.6%

faster sandbox provisioning

700k

sandboxes provisioned with Daytona

15+

hours of engineering time saved monthly

Trajectory is a continual learning platform that turns production signals into AI models for ongoing product improvement. Built by ex-DeepMind, OpenAI, and MSL researchers, it helps companies like Clay, Harvey, and Rogo build models that outperform frontier baselines on domain-specific tasks.

Headquarters

San Francisco, CA

Industry

Machine Learning

Department

Product Engineering

Key Features

Sandbox Creation Speed Sandbox Statefulness Long-Running Sandboxes

trajectory.ai

Learn how this AI research and product company partnered with Daytona to provision thousands of stateful sandboxes in seconds and power reliable agent training.

Daytona abstracts away a key part of our RL infrastructure by handling our sandbox lifecycles end-to-end. That makes them critical for helping us equip every product in the world with its own intelligence layer that improves over time.

Michael Elabd

Co-Founder and CTO of Trajectory

01 -- CHALLENGE

Self-Managed Sandboxes Created Latency and Limited Scale

Trajectory’s proposition is simple: customers connect the platform to their end-user interaction data, and receive a tailored AI model that continuously improves their product. Behind the scenes, Trajectory builds post-training reinforcement learning (RL) pipelines that train AI agents to handle the tasks that customers’ users retry, correct, and abandon. To run these training episodes reliably at scale, each pipeline requires isolated, reproducible sandboxes.

When Trajectory started operationalizing post-training, Co-Founder and CTO Michael Elabd and his team provisioned sandboxes on self-managed Kubernetes clusters. For each agent task, they pulled relevant repos, packages, and customer-specific context, then generated a stateful sandbox for the agent to train inside without interfering with other workloads and surrounding systems. This process took 30 minutes per sandbox, creating significant overhead as training volumes increased.

Provisioning speed was only one factor shaping training efficiency. RL requires running tasks in parallel to aggregate meaningful signals that drive a model update within one training cycle. Across all Trajectory customers, ongoing training demanded thousands of sandboxes provisioned and torn down per cycle. At that volume, startup times alone made it difficult to sustain efficient training runs.

Scale also raised the stakes around reliability. If a VM failed mid-task, the agent would lose its progress in that environment. The risk was particularly costly for tasks that required hours of exploration to produce useful learning signals. The longer a sandbox ran, the more RAM, CPU, and storage resources it consumed to remain stateful. Those interruptions resulted in wasted compute and compromised training efficiency.

Faced with these compounding constraints, the Trajectory team began evaluating infrastructure providers capable of provisioning persistent sandboxes rapidly and at scale. Yet, reliability proved elusive, with some platforms experiencing failure rates as high as 30% under Trajectory’s workloads.

That’s when Michael discovered Daytona. Their sandbox infrastructure was purpose-built for agentic workloads and engineered to handle the heavy demands of RL training at scale, making it a natural fit for Trajectory.

We needed thousands of simultaneous environments spinning up instantly in parallel. Most of the infrastructure that exists today wasn't built for these demands, except for Daytona.

Michael Elabd

Co-Founder and CTO of Trajectory

02 -- SOLUTION

Fast, Concurrent Sandboxes That Power Long-Running RL Training

Trajectory integrated Daytona into its RL post-training stack through the platform’s Python SDK. While the process was seamless, Daytona’s leadership set up a shared Slack channel on day one to answer integration questions in real time.

With that foundation in place, Trajectory now runs its RL training loops on reliable, agent-native sandbox infrastructure. For every customer, Michael and his team use Daytona’s Snapshots feature to pre-configure sandboxes that replicate relevant tools, repositories, and workflows. In one deployment, they replicated a customer’s Salesforce instance, product documentation, and FAQs so agents could train on the same systems used in production.

These reusable templates eliminate repetitive setup work, unlocking faster sandbox provisioning. Combined with Daytona’s warm pool of snapshot-based sandboxes, startup times have dropped from 30 minutes to seconds. That speed enables Trajectory to run RL training loops at the pace their pipeline demands, accelerating model iteration and increasing training throughput.

Once provisioned, each sandbox runs for as long as a task requires. Throughout that window, it remains fully stateful, so modified files, actions, and context accumulated by the agent are preserved in real time. That continuity means each training cycle produces a more capable, accurate, and cost-efficient model than the last.

Critically, that performance scales. Daytona runs directly on bare metal with a proprietary scheduler, bypassing the overhead of traditional VM orchestration. This architecture enables Trajectory’s engineers to provision thousands of sandboxes in parallel, each one fully isolated to ensure reproducible results.

With Daytona’s integrated lifecycle controls, sandboxes are torn down automatically, freeing RAM, CPU, and storage. At the volume Trajectory operates, that precision keeps compute consumption tightly coupled to active training work. In turn, Michael can redeploy these resources toward new training runs.

Beyond the infrastructure itself, Daytona has become a strategic partner in Trajectory’s RL operations. Even after integration, the team remains closely involved, answering questions within minutes and conducting proactive check-ins. And because Daytona’s stack provides deep visibility into sandbox behavior, Trajectory’s engineers gain the insights they need to optimize performance and iterate across their RL pipelines.

Daytona lets us spin up environments that used to take up to 30 minutes in under five seconds. And we can do that for thousands of sandboxes at once, which is what makes efficient RL post-training possible.

Michael Elabd

Co-Founder and CTO of Trajectory

03 -- RESULT

Trajectory Provisions 700k Sandboxes With Daytona

With Daytona, Trajectory turns real-world production interactions into frontier-grade models that continuously improve their customers' products. By provisioning thousands of concurrent sandboxes on demand, Michael and his team run RL training at the pace continual learning requires, accelerating model optimization for every customer.

99.6% faster sandbox provisioning
700k sandboxes provisioned with Daytona
15+ hours of engineering time saved monthly

Looking ahead, Michael expects Trajectory’s reliance on Daytona to deepen as the team scales continual learning to more customers and more complex environments. As Trajectory brings an intelligence layer to more products, Daytona remains a key partner that frees Michael and his team to focus on model enhancements rather than infrastructure management.