# Contents

If you're building AI tools or autonomous agents with large language models (LLMs), generating code is only half the job. At some point, that code needs to run - often automatically and at scale. But running LLM-generated code in production environments comes with serious risks around security, reliability, and control. That’s exactly the problem Daytona is built to solve.

In this post, I’ll walk through a minimal proof-of-concept that shows how to safely generate and run Python code using LangChain, OpenAI, and the Daytona SDK. The whole workflow happens inside a secure, isolated sandbox environment, a major step forward in making AI-assisted development safer and more reproducible.

Why Daytona?

LLMs are powerful but unpredictable. If you're using them to write code, you can't always guarantee what they'll return. That unpredictability becomes a risk when the code runs in production or on shared infrastructure.

Daytona solves this by providing isolated, programmatically controlled sandbox environments that can be easily created and destroyed. These sandboxes are stateful and can be long-running, making them ideal for agents with tasks that require maintaining state over time. For example, you can spin up a sandbox, write code into it, run the code, read results, and when you're done, tear everything down—without leaving traces or risking the host system.

What This Demo Covers

This proof-of-concept covers:

  1. Creating a Daytona sandbox.

  2. Using LangChain to generate Python code from a prompt.

  3. Executing that code securely in the sandbox.

  4. Performing basic file operations inside the sandbox (write, read, delete).

  5. Cleaning up the sandbox afterward.

Prerequisites

To run the demo, you'll need:

  • Python 3.9+

  • A Daytona API key (from Daytona)

  • An OpenAI API key (get one here)

For complete setup instructions, see the Daytona Configuration Guide .

Set your keys with a .env file:

1DAYTONA_API_KEY=your_daytona_key_here
2OPENAI_API_KEY=your_openai_key_here

Create a requirements.txt file:

1daytona>=0.0.1
2langchain>=0.1.9
3openai>=1.0.0
4python-dotenv>=1.0.0

Then set up your environment:

1python -m venv .venv && source .venv/bin/activate
2pip install -r requirements.txt

Core Workflow

Here’s the core idea:

1. Generate a prompt describing the feature to implement

1prompt = """
2Write a Python function called `solve(n: int)` that returns the factorial of `n`. The function should be safe, handle edge cases, and raise exceptions for invalid inputs.
3"""

2. Generate the code with LangChain + OpenAI

1from langchain_openai import ChatOpenAI
2from langchain.schema import HumanMessage
3
4llm = ChatOpenAI()
5response = llm.invoke([HumanMessage(content=prompt)])
6generated_code = response.content

3. Create the sandbox

1sandbox = DaytonaSandbox.create()

Learn more: Sandbox Management | Resource Configuration

4. Upload the generated code

1sandbox.filesystem.write("factorial.py", generated_code)

5. Execute the code

1output = sandbox.process.code_run("factorial.py", input="5")

For comprehensive patterns: Process and Code Execution

6. Use the sandbox filesystem

1sandbox.filesystem.write("example.txt", "Hello, Daytona!")
2content = sandbox.filesystem.read("example.txt")
3files = sandbox.filesystem.list()
4sandbox.filesystem.delete("example.txt")

7. Delete the sandbox

1sandbox.delete()

Sample console output:

1$ Created sandbox: sbx_123...
2=== Generated code ===
3def solve(n): ...
4=== Execution result ===
5120
6=== Filesystem demo ===
7example.txt: Hello, Daytona!
8Deleted sandbox. Bye!

Extending the Pattern: TDD with AI

In the second example, we add a layer of quality control: tests.

1. Prompt for a matching PyTest test suite

1test_prompt = """
2Write a PyTest test suite for a factorial function `solve(n: int)`. Cover:
3- Positive integers
4- Zero
5- Negative input (should raise ValueError)
6- Non-integer input (should raise TypeError)
7"""

2. Generate the test with LangChain + OpenAI

1test_response = llm.invoke([HumanMessage(content=test_prompt)])
2generated_tests = test_response.content

3. Upload both to the sandbox

1sandbox.filesystem.write("factorial.py", generated_code)
2sandbox.filesystem.write("test_factorial.py", generated_tests)

4. Run PyTest inside the sandbox

1result = sandbox.process.pytest("test_factorial.py")

If tests fail, the code regenerates the implementation and re-runs the tests.

Why This Pattern Works

  • Safe by design: The sandbox ensures no AI-generated code can affect the real environment.

  • Testable: Adding TDD lets you validate AI output automatically.

  • Flexible: You can prompt for new functions, generate edge case tests, and reuse the pattern in pipelines.

Next Steps

  • Customize prompts for your domain (e.g. data parsing, calculations, config generation).

  • Explore Daytona sandbox options (CPU/memory limits, timeouts).

  • Extend to multi-file modules or async code.

Final Thoughts

At Devōt, we often build tools that involve dynamic code execution, whether for internal platforms or client-facing products. Daytona gave us a way to experiment safely, with clear boundaries and control.

To be honest, for teams like ours, working at the intersection of AI and engineering, that kind of isolation isn’t just helpful, it’s necessary.


Ready to build your own AI coding assistant? Check out the complete documentation.

Tags::
  • LLM
  • LangChain
  • OpenAI
  • Daytona
  • Sandbox
  • AI Agents
  • Python
  • Code Execution
  • Security
  • AI Safety
  • TDD