Skip to content

Run Gemini CLI Headlessly in Daytona and Stream Its Output

View as Markdown

This guide demonstrates how to run Google’s Gemini CLI as a headless coding agent inside a Daytona sandbox. The agent can write code in any language, install dependencies, run scripts, and reason over a project, all inside a secure, isolated, disposable sandbox while its output streams back to your terminal in real time.


When you launch the main module, a Daytona sandbox is created and the Gemini CLI is installed inside it. The agent is driven headlessly with gemini -p "<prompt>" --yolo --output-format stream-json, and its newline-delimited JSON events are parsed and printed as the agent works.

You interact with the main program via a command line chat interface. The program sends your prompts to the Gemini CLI inside the sandbox, which writes code, runs commands, and streams the results back as it works. Each tool call surfaces as a [tool] line, followed by the assistant’s response.

Gemini sessions are stateful, so each turn reuses the session captured from the first run, keeping context across the conversation. You can continue interacting with your agent until you are finished. When you exit the program, the sandbox is deleted automatically.

First, clone the daytona repository and navigate to the example directory:

Terminal window
git clone https://github.com/daytonaio/daytona.git
cd daytona/guides/typescript/gemini/gemini-cli

Get your API keys:

Copy .env.example to .env and add your keys:

Terminal window
DAYTONA_API_KEY=your_daytona_key
SANDBOX_GEMINI_API_KEY=your_gemini_key

Install dependencies:

Terminal window
npm install

Run the agent:

Terminal window
npm run start

The agent will start and wait for your prompt.

Ask the agent to write and run some code. Here it generates an ASCII-art Mandelbrot fractal inside the sandbox and executes it, streaming each tool call and the program output back to your terminal:

$ npm run start
Creating sandbox...
Installing Gemini CLI...
Starting Gemini CLI...
Agent ready. Press Ctrl+C at any time to exit.
User: Write a Python script mandelbrot.py that renders the Mandelbrot set as ASCII art roughly 40 columns by 20 rows, then run it and show the rendered output
[tool] write_file
[tool] run_shell_command
[tool] replace
[tool] run_shell_command
I have successfully created and executed the Python script mandelbrot.py to render the Mandelbrot set as ASCII art.
......-:@...
.......:%+:....
........:*@@*:....
.....+-:--=@@-:::::.
.......:@%@@@@@@@@=#+..
.........==@@@@@@@@@@@+:..
.....-::::::%@@@@@@@@@@@@@%:..
.......:-@*@%--@@@@@@@@@@@@@@%:..
.......::%@@@@@+@@@@@@@@@@@@@@#...
..-:.::+@@@@@@@@@@@@@@@@@@@@@@:...
..-:.::+@@@@@@@@@@@@@@@@@@@@@@:...
.......::%@@@@@+@@@@@@@@@@@@@@#...
.......:-@*@%--@@@@@@@@@@@@@@%:..
.....-::::::%@@@@@@@@@@@@@%:..
.........==@@@@@@@@@@@+:..
.......:@%@@@@@@@@=#+..
.....+-:--=@@-:::::.
........:*@@*:....
.......:%+:....
......-:@...
User:

This example consists of two parts: a main program (src/index.ts) that manages the sandbox and a command-line loop, and a session class (src/session.ts) that drives the Gemini CLI over a PTY and parses its streaming JSON output.

On startup, the script:

  1. Creates a new Daytona sandbox with the Gemini API key injected as an environment variable.
  2. Installs the Gemini CLI globally in the sandbox.
  3. Creates a PTY for streaming output from the Gemini CLI.
  4. Enters a readline loop to send prompts and receive streamed responses.
  5. On Ctrl+C, kills the PTY session, deletes the sandbox, and exits.

The Gemini CLI defaults to interactive browser OAuth, which would hang a headless run. Passing GEMINI_API_KEY as a sandbox environment variable at create time lets the CLI authenticate non-interactively. GEMINI_CLI_TRUST_WORKSPACE bypasses the CLI’s workspace-trust prompt, which otherwise blocks --yolo runs in a fresh sandbox directory:

sandbox = await daytona.create({
envVars: {
GEMINI_API_KEY: process.env.SANDBOX_GEMINI_API_KEY,
GEMINI_CLI_TRUST_WORKSPACE: 'true',
},
})
const install = await sandbox.process.executeCommand('npm install -g @google/gemini-cli')
if (install.exitCode !== 0) {
throw new Error('Error installing Gemini CLI: ' + install.result)
}

The session uses a pseudo-terminal (PTY) for streaming output from the Gemini CLI:

async initialize(): Promise<void> {
this.ptyHandle = await this.sandbox.process.createPty({
id: `gemini-pty-${Date.now()}`,
cols: 120,
rows: 30,
onData: (data: Uint8Array) => this.handleData(data),
})
await this.ptyHandle.waitForConnection()
}

Each prompt is sent as a gemini command in headless mode. -p runs a one-shot non-interactive prompt, --yolo auto-approves tool calls so the run never blocks on a permission prompt, and --output-format stream-json emits newline-delimited JSON events. When a session ID has been captured, -r resumes that session for multi-turn continuity:

async processPrompt(prompt: string): Promise<void> {
const flags = ['-p', this.shellQuote(prompt), '--yolo', '--output-format', 'stream-json']
// -r resumes the existing session for multi-turn continuity.
if (this.sessionId) flags.unshift('-r', this.shellQuote(this.sessionId))
const command = ['gemini', ...flags].join(' ')
await this.ptyHandle!.sendInput(`cd ${WORK_DIR} && ${command}\n`)
await new Promise<void>((resolve) => {
this.onResponseComplete = resolve
})
}

The Gemini CLI outputs JSON lines that are parsed to track agent activity. The handleData method buffers incoming PTY bytes and processes each complete line, keeping any incomplete line for the next chunk. A stateful TextDecoder is reused across calls so partial multi-byte UTF-8 sequences split across PTY chunks are preserved instead of being corrupted:

private decoder = new TextDecoder('utf-8')
private handleData(data: Uint8Array): void {
this.buffer += this.decoder.decode(data, { stream: true })
const lines = this.buffer.split('\n')
this.buffer = lines.pop() || ''
for (const line of lines.map((l) => l.trim()).filter(Boolean)) {
try {
this.handleEvent(JSON.parse(line) as GeminiStreamEvent)
} catch {
debug('non-JSON line:', line)
}
}
}

Event types from the Gemini CLI’s streaming JSON (schema reference):

  • init: Session metadata (session_id, model) - captured to resume the session on later turns
  • message: User and assistant message chunks (assistant text is printed live)
  • tool_use: Tool call requests with arguments
  • tool_result: Output from executed tools
  • error: Non-fatal warnings and system errors
  • result: Final outcome with aggregated statistics - signals response completion
private handleEvent(event: GeminiStreamEvent): void {
switch (event.type) {
case 'init': {
const init = event as InitEvent
if (init.session_id && !this.sessionId) {
this.sessionId = init.session_id
debug('captured session_id:', this.sessionId)
}
return
}
case 'message': {
const msg = event as MessageEvent
if (msg.role === 'assistant' && msg.content) {
process.stdout.write(msg.content)
}
return
}
case 'tool_use': {
const tool = event as ToolUseEvent
// Skip update_topic: an internal Gemini bookkeeping tool, not a user-facing action.
if (tool.tool_name === 'update_topic') return
process.stdout.write(`\n[tool] ${tool.tool_name}\n`)
return
}
case 'tool_result': {
const result = event as ToolResultEvent
if (result.status === 'error' && result.error) {
process.stdout.write(`\n[tool error] ${result.error.message}\n`)
}
return
}
case 'error': {
const err = event as ErrorEvent
process.stderr.write(`\n[${err.severity}] ${err.message}\n`)
return
}
case 'result': {
const res = event as ResultEvent
if (res.status === 'error' && res.error) {
process.stderr.write(`\nFailed: ${res.error.message}\n`)
}
process.stdout.write('\n')
this.onResponseComplete?.()
return
}
}
}

When the result event arrives, onResponseComplete resolves the promise that processPrompt is awaiting, so the readline loop can prompt for the next turn.

Key advantages:

  • Secure, isolated execution in Daytona sandboxes
  • Fully headless operation, no browser OAuth and no permission prompts
  • Streaming JSON output (--output-format stream-json) for real-time tool and message activity
  • PTY-based communication for low-latency streaming
  • Session-based conversation continuity across prompts (-r)
  • All agent code execution happens inside the sandbox
  • Automatic cleanup on exit