Run Gemini CLI Headlessly in Daytona and Stream Its Output

View as Markdown

This guide demonstrates how to run Google’s Gemini CLI as a headless coding agent inside a Daytona sandbox. The agent can write code in any language, install dependencies, run scripts, and reason over a project, all inside a secure, isolated, disposable sandbox while its output streams back to your terminal in real time.

1. Workflow Overview

When you launch the main module, a Daytona sandbox is created and the Gemini CLI is installed inside it. The agent is driven headlessly with gemini -p "<prompt>" --yolo --output-format stream-json, and its newline-delimited JSON events are parsed and printed as the agent works.

You interact with the main program via a command line chat interface. The program sends your prompts to the Gemini CLI inside the sandbox, which writes code, runs commands, and streams the results back as it works. Each tool call surfaces as a [tool] line, followed by the assistant’s response.

Gemini sessions are stateful, so each turn reuses the session captured from the first run, keeping context across the conversation. You can continue interacting with your agent until you are finished. When you exit the program, the sandbox is deleted automatically.

2. Project Setup

Clone the Repository

First, clone the daytona repository and navigate to the example directory:

git clone https://github.com/daytonaio/daytona.git
cd daytona/guides/typescript/gemini/gemini-cli

Configure Environment

Get your API keys:

Daytona API key: Daytona Dashboard
Gemini API key: Google AI Studio

Copy .env.example to .env and add your keys:

DAYTONA_API_KEY=your_daytona_key
SANDBOX_GEMINI_API_KEY=your_gemini_key

Local Usage

Install dependencies:

npm install

Run the agent:

npm run start

The agent will start and wait for your prompt.

3. Example Usage

Ask the agent to write and run some code. Here it generates an ASCII-art Mandelbrot fractal inside the sandbox and executes it, streaming each tool call and the program output back to your terminal:

$ npm run start
Creating sandbox...
Installing Gemini CLI...
Starting Gemini CLI...

Agent ready. Press Ctrl+C at any time to exit.

User: Write a Python script mandelbrot.py that renders the Mandelbrot set as ASCII art roughly 40 columns by 20 rows, then run it and show the rendered output
[tool] write_file
[tool] run_shell_command
[tool] replace
[tool] run_shell_command
I have successfully created and executed the Python script mandelbrot.py to render the Mandelbrot set as ASCII art.

               ......-:@...
                .......:%+:....
              ........:*@@*:....
             .....+-:--=@@-:::::.
           .......:@%@@@@@@@@=#+..
        .........==@@@@@@@@@@@+:..
     .....-::::::%@@@@@@@@@@@@@%:..
  .......:-@*@%--@@@@@@@@@@@@@@%:..
 .......::%@@@@@+@@@@@@@@@@@@@@#...
 ..-:.::+@@@@@@@@@@@@@@@@@@@@@@:...
 ..-:.::+@@@@@@@@@@@@@@@@@@@@@@:...
 .......::%@@@@@+@@@@@@@@@@@@@@#...
  .......:-@*@%--@@@@@@@@@@@@@@%:..
     .....-::::::%@@@@@@@@@@@@@%:..
        .........==@@@@@@@@@@@+:..
           .......:@%@@@@@@@@=#+..
             .....+-:--=@@-:::::.
              ........:*@@*:....
                .......:%+:....
                  ......-:@...

User:

4. Understanding the Script

This example consists of two parts: a main program (src/index.ts) that manages the sandbox and a command-line loop, and a session class (src/session.ts) that drives the Gemini CLI over a PTY and parses its streaming JSON output.

Initialization

On startup, the script:

Creates a new Daytona sandbox with the Gemini API key injected as an environment variable.
Installs the Gemini CLI globally in the sandbox.
Creates a PTY for streaming output from the Gemini CLI.
Enters a readline loop to send prompts and receive streamed responses.
On Ctrl+C, kills the PTY session, deletes the sandbox, and exits.

Creating the Sandbox

The Gemini CLI defaults to interactive browser OAuth, which would hang a headless run. Passing GEMINI_API_KEY as a sandbox environment variable at create time lets the CLI authenticate non-interactively. GEMINI_CLI_TRUST_WORKSPACE bypasses the CLI’s workspace-trust prompt, which otherwise blocks --yolo runs in a fresh sandbox directory:

sandbox = await daytona.create({
  envVars: {
    GEMINI_API_KEY: process.env.SANDBOX_GEMINI_API_KEY,
    GEMINI_CLI_TRUST_WORKSPACE: 'true',
  },
})

const install = await sandbox.process.executeCommand('npm install -g @google/gemini-cli')
if (install.exitCode !== 0) {
  throw new Error('Error installing Gemini CLI: ' + install.result)
}

PTY Communication

The session uses a pseudo-terminal (PTY) for streaming output from the Gemini CLI:

async initialize(): Promise<void> {
  this.ptyHandle = await this.sandbox.process.createPty({
    id: `gemini-pty-${Date.now()}`,
    cols: 120,
    rows: 30,
    onData: (data: Uint8Array) => this.handleData(data),
  })
  await this.ptyHandle.waitForConnection()
}

Running Gemini Commands

Each prompt is sent as a gemini command in headless mode. -p runs a one-shot non-interactive prompt, --yolo auto-approves tool calls so the run never blocks on a permission prompt, and --output-format stream-json emits newline-delimited JSON events. When a session ID has been captured, -r resumes that session for multi-turn continuity:

async processPrompt(prompt: string): Promise<void> {
  const flags = ['-p', this.shellQuote(prompt), '--yolo', '--output-format', 'stream-json']
  // -r resumes the existing session for multi-turn continuity.
  if (this.sessionId) flags.unshift('-r', this.shellQuote(this.sessionId))
  const command = ['gemini', ...flags].join(' ')

  await this.ptyHandle!.sendInput(`cd ${WORK_DIR} && ${command}\n`)
  await new Promise<void>((resolve) => {
    this.onResponseComplete = resolve
  })
}

Streaming JSON Messages

The Gemini CLI outputs JSON lines that are parsed to track agent activity. The handleData method buffers incoming PTY bytes and processes each complete line, keeping any incomplete line for the next chunk. A stateful TextDecoder is reused across calls so partial multi-byte UTF-8 sequences split across PTY chunks are preserved instead of being corrupted:

private decoder = new TextDecoder('utf-8')

private handleData(data: Uint8Array): void {
  this.buffer += this.decoder.decode(data, { stream: true })
  const lines = this.buffer.split('\n')
  this.buffer = lines.pop() || ''
  for (const line of lines.map((l) => l.trim()).filter(Boolean)) {
    try {
      this.handleEvent(JSON.parse(line) as GeminiStreamEvent)
    } catch {
      debug('non-JSON line:', line)
    }
  }
}

Event types from the Gemini CLI’s streaming JSON (schema reference):

init: Session metadata (session_id, model) - captured to resume the session on later turns
message: User and assistant message chunks (assistant text is printed live)
tool_use: Tool call requests with arguments
tool_result: Output from executed tools
error: Non-fatal warnings and system errors
result: Final outcome with aggregated statistics - signals response completion

private handleEvent(event: GeminiStreamEvent): void {
  switch (event.type) {
    case 'init': {
      const init = event as InitEvent
      if (init.session_id && !this.sessionId) {
        this.sessionId = init.session_id
        debug('captured session_id:', this.sessionId)
      }
      return
    }
    case 'message': {
      const msg = event as MessageEvent
      if (msg.role === 'assistant' && msg.content) {
        process.stdout.write(msg.content)
      }
      return
    }
    case 'tool_use': {
      const tool = event as ToolUseEvent
      // Skip update_topic: an internal Gemini bookkeeping tool, not a user-facing action.
      if (tool.tool_name === 'update_topic') return
      process.stdout.write(`\n[tool] ${tool.tool_name}\n`)
      return
    }
    case 'tool_result': {
      const result = event as ToolResultEvent
      if (result.status === 'error' && result.error) {
        process.stdout.write(`\n[tool error] ${result.error.message}\n`)
      }
      return
    }
    case 'error': {
      const err = event as ErrorEvent
      process.stderr.write(`\n[${err.severity}] ${err.message}\n`)
      return
    }
    case 'result': {
      const res = event as ResultEvent
      if (res.status === 'error' && res.error) {
        process.stderr.write(`\nFailed: ${res.error.message}\n`)
      }
      process.stdout.write('\n')
      this.onResponseComplete?.()
      return
    }
  }
}

When the result event arrives, onResponseComplete resolves the promise that processPrompt is awaiting, so the readline loop can prompt for the next turn.

Key advantages:

Secure, isolated execution in Daytona sandboxes
Fully headless operation, no browser OAuth and no permission prompts
Streaming JSON output (--output-format stream-json) for real-time tool and message activity
PTY-based communication for low-latency streaming
Session-based conversation continuity across prompts (-r)
All agent code execution happens inside the sandbox
Automatic cleanup on exit