Skip to content

Build a Generative-UI Coding Agent with CopilotKit and Daytona

View as Markdown

This guide demonstrates how to build a CopilotKit Built-in Agent backed by a Daytona sandbox with full shell and filesystem access. The agent handles whatever a developer might do at a terminal: build apps, debug or analyze code, run scripts, work with data, install packages.

Every tool call streams into the chat as generative UI: shell commands render as terminal cards, file edits as syntax-highlighted code, listings and grep results as structured cards, and any hosted process (dev server, static site, API) as a live <iframe> directly in the message stream.


The agent has 11 tools and a sandbox. On the first prompt it calls createSandbox, then chains shell commands, file writes, and filesystem queries until the user’s request is done. When the user asks for something they should see in a browser, the agent calls startWebServer with the dev-server command and its port; that tool spawns the server in the background, polls for a ready signal, and returns the preview URL, which the frontend embeds as a live iframe inside the chat.

Follow-up turns trigger more writeFile calls. If the iframe is up and the dev server has HMR, edits reload the inner page in place, so a chat back-and-forth becomes a live editing loop on the running app.

Clone the Daytona repository and navigate to the example directory:

Terminal window
git clone https://github.com/daytonaio/daytona.git
cd daytona/guides/typescript/copilotkit/generative-ui-coding-agent

Get your API keys:

Copy .env.example to .env and fill them in:

Terminal window
cp .env.example .env
Terminal window
DAYTONA_API_KEY=your_daytona_key
OPENAI_API_KEY=your_openai_key
Terminal window
npm install

Before walking through the implementation, here are the key concepts the code relies on: CopilotKit’s Built-in Agent, its generative-UI streaming pattern, and the single-endpoint routing that Next.js exposes.

BuiltInAgent is CopilotKit’s agent runtime. You give it a model spec, a system prompt, and a list of tools; it runs the model in a per-turn loop (tool call → tool result → tool call → … → final text response), iterating up to the configured maxSteps limit and stopping early when the model returns a final text response with no further tool calls.

The runtime emits a structured streaming protocol on top of that loop: each tool call surfaces as a sequence of events (TOOL_CALL_START, one or more TOOL_CALL_ARGS carrying argument deltas as the model streams them, TOOL_CALL_END, then TOOL_CALL_RESULT once the server-side execute returns) that the React side subscribes to via hooks like useRenderTool. That’s what lets us render each tool call as its own live card in the chat as the agent works.

Next.js App Router uses convention-based filenames inside app/: page.tsx is the page component, layout.tsx wraps it, and route.ts files become API Route Handlers. So app/api/copilotkit/route.ts exposes a single endpoint at /api/copilotkit.

Every CopilotKit operation (start a run, fetch suggestions, list threads, …) hits that single endpoint as a POST with a method field in the JSON body identifying the operation, in a JSON-RPC style. The runtime helper copilotRuntimeNextJSAppRouterEndpoint builds the handleRequest function for you; you just return handleRequest(req) from the route’s POST export.

You can verify the dispatch shape by hitting the endpoint with an empty body:

Terminal window
curl -X POST http://localhost:3000/api/copilotkit \
-H 'Content-Type: application/json' \
-d '{}'
# {"error":"invalid_request","message":"Missing method field"}

On the React side, useRenderTool({ name, parameters, render }) registers a renderer for a specific tool name. The render function receives { status, parameters, result }:

  • status transitions through inProgress (model is composing args) → executing (server is running the tool) → complete (result available)
  • parameters is the typed, validated tool arguments inferred from the Zod schema
  • result is the JSON-serialized tool return value, present only when status === 'complete'. It is a string that needs JSON.parse for object access

Each renderer maps a tool call to a React component, so the chat shows live, streaming, structured feedback as the agent works.

The agent exposes 11 tools, defined with defineTool from @copilotkit/runtime/v2. Each is a thin wrapper around a Daytona TypeScript SDK call:

ToolDaytona SDK call
createSandbox (with optional envVars, labels, autoStopInterval)daytona.create({ public: true, envVars, labels, autoStopInterval })
runCommand (with optional background)sandbox.process.executeCommand / executeSessionCommand(..., { runAsync: true })
writeFilesandbox.fs.uploadFile
readFilesandbox.fs.downloadFile
listFilessandbox.fs.listFiles
findFiles (grep)sandbox.fs.findFiles
searchFiles (glob)sandbox.fs.searchFiles
replaceInFiles (codemod)sandbox.fs.replaceInFiles
getFileDetailssandbox.fs.getFileDetails
startWebServerprocess.createSession + executeSessionCommand({runAsync:true}) + log polling + getPreviewLink
getPreviewUrlsandbox.getPreviewLink

The full backend lives in app/api/copilotkit/route.ts. The frontend is split across app/layout.tsx, app/page.tsx, and a handful of card components in components/. We walk through both top-down.

The backend pulls the runtime helpers from @copilotkit/runtime, the v2 agent + tool helpers from @copilotkit/runtime/v2, the Daytona client, the Next.js request type, and Zod for tool parameter schemas:

import {
CopilotRuntime,
copilotRuntimeNextJSAppRouterEndpoint,
} from '@copilotkit/runtime'
import { BuiltInAgent, defineTool } from '@copilotkit/runtime/v2'
import { Daytona } from '@daytona/sdk'
import type { NextRequest } from 'next/server'
import { z } from 'zod'
const daytona = new Daytona({ apiKey: process.env.DAYTONA_API_KEY })

The system prompt frames the agent and lists every tool with its arguments. The model uses this verbatim to decide what to do on each turn:

You are a coding agent with shell access to a fresh Daytona sandbox.
The user can ask you anything a developer might do at a terminal: build apps,
debug or analyze code, run scripts, work with data, install packages, write
tests, whatever fits the request.
Work under /home/daytona by default. Reuse the same sandboxId across every
tool call. The sandbox auto-deletes after a period of inactivity; if a tool
call fails because the sandbox no longer exists, call createSandbox again
and continue with the new sandboxId.
When the user wants to see a running web app:
1. Prefer a modern, maintained scaffolder. Vite is the safest default for
React/TS/SPA work; use `npm create vite@latest <name> -- --template
react-ts --yes` or similar. Avoid `create-react-app`; it is deprecated and
has very slow first-compile times.
2. ALWAYS bind the dev server to 0.0.0.0 or the Daytona proxy will not reach
it. Cheat sheet:
- Vite: `vite --host 0.0.0.0 --port 5173` (CLI flag) AND write a
`vite.config.ts` with `server: { host: '0.0.0.0', port: 5173, strictPort:
true, hmr: { clientPort: 443, protocol: 'wss' } }` so HMR survives the
HTTPS proxy.
- Next.js: `next dev -H 0.0.0.0 -p 3000`.
- Express / Node: `app.listen(PORT, '0.0.0.0')`.
- Flask: `flask run --host 0.0.0.0 --port 5000`.
- FastAPI / Uvicorn: `uvicorn main:app --host 0.0.0.0 --port 8000`.
3. Use startWebServer with the dev-server command and its port. It starts
the server in the background, waits for the port to be reachable, and
returns the preview URL in one shot.
Reply to the user with one short sentence per turn. The tool cards in the
chat carry the visual feedback.

Each tool is a defineTool({ name, description, parameters, execute }) block. The parameters schema is a Zod object whose inferred type becomes the execute callback’s argument shape, so the tool is type-safe end to end.

createSandbox exposes only the params most relevant to a chat-driven coding agent: envVars (inject secrets), labels (org tagging), and autoStopInterval (idle-stop in minutes, default 15):

const createSandbox = defineTool({
name: 'createSandbox',
description:
'Create a fresh Daytona sandbox with public preview URLs enabled. Call ONCE at session start; reuse the returned sandboxId for every subsequent tool call. Optionally inject environment variables, labels, or change the auto-stop interval.',
parameters: z.object({
envVars: z
.record(z.string())
.optional()
.describe(
'Environment variables to set inside the sandbox. Use this when the user provides API keys or other secrets the project needs.',
),
labels: z.record(z.string()).optional().describe('Optional labels for organization-level sandbox tracking.'),
autoStopInterval: z
.number()
.optional()
.describe('Minutes of inactivity before the sandbox auto-stops. 0 disables, default 15.'),
}),
execute: async ({ envVars, labels, autoStopInterval }) => {
const sandbox = await daytona.create({
public: true,
ephemeral: true,
envVars,
labels,
autoStopInterval,
})
return { sandboxId: sandbox.id }
},
})

runCommand is the most heavily-used tool. The background flag flips between a synchronous executeCommand (blocks until exit, returns stdout) and an asynchronous executeSessionCommand({ runAsync: true }) (returns immediately, leaves the process running). The background path is for long-lived non-preview processes the agent won’t need to interact with again, like test watchers, build watchers, or log tails. Dev servers the user should see in a browser go through the dedicated startWebServer tool instead, which spawns the server, polls its logs for a ready signal, and returns the preview URL atomically:

const runCommand = defineTool({
name: 'runCommand',
description:
'Execute a shell command in the sandbox. Set background:true for long-lived fire-and-forget processes (test watchers, build watchers, log followers) the agent will not need to interact with again. Use plain commands (rm, mv, mkdir, chmod, ...) for filesystem ops that do not need structured output. For dev servers the user should see in a browser, use startWebServer instead — it returns the preview URL atomically.',
parameters: z.object({
sandboxId: z.string(),
command: z.string().describe('Shell command. Use && to chain. Absolute paths or `cd /home/daytona && ...`.'),
background: z
.boolean()
.optional()
.describe(
'Run asynchronously and return immediately. Use for long-lived non-preview processes such as watchers or log tails; for user-visible dev servers, use startWebServer.',
),
}),
execute: async ({ sandboxId, command, background }) => {
const sandbox = await daytona.get(sandboxId)
if (background) {
const sessionId = `bg-${Date.now()}`
await sandbox.process.createSession(sessionId)
const result = await sandbox.process.executeSessionCommand(sessionId, {
command,
runAsync: true,
})
return { background: true, sessionId, cmdId: result.cmdId, command }
}
const result = await sandbox.process.executeCommand(command)
return { exitCode: result.exitCode, stdout: result.result, command }
},
})

writeFile always takes the FULL new content. There is no diff or patch format; the agent must send the whole file every time it edits one. This keeps the model’s job simple and avoids merge ambiguity:

const writeFile = defineTool({
name: 'writeFile',
description: 'Write a file with the FULL new content. Overwrites if it exists.',
parameters: z.object({
sandboxId: z.string(),
path: z.string().describe('Absolute path, e.g. "/home/daytona/app/src/App.tsx".'),
content: z.string().describe('Complete new file content.'),
}),
execute: async ({ sandboxId, path, content }) => {
const sandbox = await daytona.get(sandboxId)
await sandbox.fs.uploadFile(Buffer.from(content), path)
return { path, bytesWritten: Buffer.byteLength(content) }
},
})

getPreviewUrl returns the Daytona proxy URL for any port the sandbox has open. It is the standalone counterpart to startWebServer: use it when the agent has already brought up a hosted process by other means (a previous runCommand, an already-running service) and just needs the URL to surface as an iframe, without spawning a new dev server:

const getPreviewUrl = defineTool({
name: 'getPreviewUrl',
description:
'Get the public preview URL for a port on the sandbox. The port is opened automatically if it was closed. Call after starting a hosted process the user should see in a browser.',
parameters: z.object({
sandboxId: z.string(),
port: z.number().describe('Port the hosted process is listening on.'),
}),
execute: async ({ sandboxId, port }) => {
const sandbox = await daytona.get(sandboxId)
const preview = await sandbox.getPreviewLink(port)
return { url: preview.url, port }
},
})

The remaining six tools (readFile, listFiles, findFiles, searchFiles, replaceInFiles, getFileDetails) follow the same shape: a Zod schema for inputs, a one-call wrapper around the Daytona SDK in execute, and a structured object returned to the frontend.

Step 3: Mount the agent on /api/copilotkit

Section titled “Step 3: Mount the agent on /api/copilotkit”

The agent ties everything together. model is the model identifier in provider:model form, which CopilotKit resolves to the right provider client. prompt is the system prompt from Step 1. tools is the list of defineTool results from Step 2. maxSteps caps how many tool-call iterations a single turn can run before the runtime forces a final answer:

const agent = new BuiltInAgent({
model: 'openai:gpt-5.4',
prompt: SYSTEM_PROMPT,
tools: [
createSandbox,
runCommand,
writeFile,
readFile,
listFiles,
findFiles,
searchFiles,
replaceInFiles,
getFileDetails,
startWebServer,
getPreviewUrl,
],
maxSteps: 30,
})
const runtime = new CopilotRuntime({
agents: { default: agent },
})
export const POST = async (req: NextRequest) => {
const { handleRequest } = copilotRuntimeNextJSAppRouterEndpoint({
runtime,
endpoint: '/api/copilotkit',
})
return handleRequest(req)
}

The POST export is what Next.js picks up as the route handler. CopilotKit’s handleRequest reads the method field from the JSON body and dispatches to the matching runtime operation, so this single handler serves every CopilotKit RPC the React frontend makes.

components/CopilotKitRoot.tsx is a thin client component that wraps children in the v2 provider and points it at the runtime endpoint. The provider sets up the AG-UI event stream and gives <CopilotChat> (and any other v2 components nested under it) access to the runtime:

'use client'
import { CopilotKit } from '@copilotkit/react-core/v2'
import type { ReactNode } from 'react'
import '@copilotkit/react-core/v2/styles.css'
export function CopilotKitRoot({ children }: { children: ReactNode }) {
return <CopilotKit runtimeUrl="/api/copilotkit">{children}</CopilotKit>
}

app/layout.tsx uses it to wrap the whole tree.

Step 5: Register tool renderers with useRenderTool

Section titled “Step 5: Register tool renderers with useRenderTool”

app/page.tsx registers one useRenderTool hook per backend tool. Each render-prop receives { status, parameters, result }. Because result is JSON-serialized as a string, we run it through a small helper before reading fields off it:

function parseResult<T>(result: unknown): T | undefined {
if (typeof result !== 'string' || result.length === 0) return undefined
try {
return JSON.parse(result) as T
} catch {
return undefined
}
}

Each render hook references a named Zod schema and a result type declared at the top of app/page.tsx. The renderer for startWebServer (and the fallback getPreviewUrl) is the one users notice most: while either tool is inProgress/executing the card shows a shimmering skeleton, and once complete it flips to a live <iframe> pointing at the Daytona preview URL. Both tools share the same PreviewCard component because the shape of the data the card needs (url and port) is identical; the getPreviewUrl registration is shown below:

const getPreviewUrlParams = z.object({
sandboxId: z.string(),
port: z.number(),
})
type GetPreviewUrlResult = { url: string; port: number }
useRenderTool({
name: 'getPreviewUrl',
parameters: getPreviewUrlParams,
render: ({ status, parameters, result }) => {
const r = parseResult<GetPreviewUrlResult>(result)
return <PreviewCard status={status} url={r?.url} port={parameters?.port} />
},
})

The iframe stays mounted across subsequent turns, so when the agent calls writeFile to update a file in the running dev server, the dev server’s HMR (over the WebSocket the iframe already holds open) reloads the inner page in place without a React re-render.

The other renderers follow the same pattern with their own cards: a terminal-style TerminalCard for runCommand, a syntax-highlighted FileCard for writeFile / readFile, a structured FileListCard for listFiles / searchFiles, a GrepCard for findFiles matches, a ReplaceCard for replaceInFiles (showing pattern → newValue plus per-file success/fail), and a compact FileInfoCard for getFileDetails metadata.

useConfigureSuggestions seeds the chat with dynamically-generated starter prompts so the empty state is not actually empty:

useConfigureSuggestions({
instructions:
'Suggest 3 short, varied prompts a developer might ask a coding agent with shell access. Mix app-building requests with debugging, scripting, or data-analysis tasks (e.g. "Build a todo app", "Find the bug in this Python script", "Generate a CSV of prime numbers under 1000").',
minSuggestions: 3,
maxSuggestions: 3,
available: 'always',
})

available: 'always' keeps the suggestion pills visible even after the first message, which makes follow-up exploration easier.

Terminal window
npm run dev

Open http://localhost:3000 and ask the agent for something. For example:

Build the classic Snake game in Vite + React using HTML canvas. Use arrow keys to control the snake, count score, end on collision with wall or self, with a restart button. Make the game area dark green and the snake bright green.

The chat fills with one tool-call card per agent step. Every card is expandable via the chevron on the right, so you can introspect details like sandbox params, exit codes, session and command IDs, written-file content, and the full preview URL.

A typical first turn looks like this:

✓ Sandbox ready a0ffcc3d-7753-4d52-89d0-6c595c60626a
✓ done
$ cd /home/daytona && npm create vite@latest snake-game -- --template react-ts --yes && cd snake-game && npm install
npm warn exec The following package was not found and will be installed: create-vite@9.0.7
> npx
> "create-vite" snake-game --template react-ts --yes
◇ Scaffolding project in /home/daytona/snake-game...
└ Done. Now run:
cd snake-game
npm install
npm run dev
added 152 packages, and audited 153 packages in 17s
42 packages are looking for funding
run `npm fund` for details
found 0 vulnerabilities
✓ wrote /home/daytona/snake-game/src/App.tsx 4492 B
✓ wrote /home/daytona/snake-game/src/App.css 1527 B
✓ wrote /home/daytona/snake-game/src/index.css 307 B
✓ wrote /home/daytona/snake-game/vite.config.ts 273 B
✓ done
$ cd /home/daytona/snake-game && npm run build
> snake-game@0.0.0 build
> tsc -b && vite build
vite v8.0.16 building client environment for production...
transforming...✓ 17 modules transformed.
rendering chunks...
computing gzip size...
dist/index.html 0.46 kB │ gzip: 0.29 kB
dist/assets/index-BWaglkr_.css 1.49 kB │ gzip: 0.77 kB
dist/assets/index-BAnbeFU0.js 192.33 kB │ gzip: 60.80 kB
✓ built in 161ms
● Live preview
<iframe src="https://5173-a0ffcc3d-7753-4d52-89d0-6c595c60626a.daytonaproxy01.net" /> — live preview of the running Snake game
Done — the Snake game is running here: https://5173-a0ffcc3d-7753-4d52-89d0-6c595c60626a.daytonaproxy01.net
Snake game built and previewed live in the chat by the CopilotKit + Daytona coding agent

On follow-up turns, the agent edits files in place and the iframe reloads instantly with the new changes. For example, asking “Make it red themed” triggers:

✓ wrote /home/daytona/snake-game/src/App.css 1529 B
Done — the app now uses a red theme.
  • Chat UI for better UX and easier introspection. A familiar conversational interface is simpler than a raw shell, and you can follow every step the agent takes right in the chat.
  • Purpose-built UI cards for every tool call. Each agent action renders through a dedicated React card (FileCard, TerminalCard, FileListCard, GrepCard, ReplaceCard, FileInfoCard, PreviewCard), each tailored to that tool’s data so the user always sees the right visual for the action.
  • Live streaming of every tool call. Each step surfaces as a structured card as it happens, so you can introspect exactly what the agent is doing in real time rather than waiting for a final summary.
  • Any hosted process embeds as an <iframe> directly in the chat, with no link-outs or screenshots.
  • Conversational edit loop with HMR, no manual refresh.
  • All code execution happens inside an isolated Daytona sandbox, never on the host running the chat.