コンテンツにスキップ

Computer Use

View as Markdown

このコンテンツはまだ日本語訳がありません。

Computer Use enables programmatic control of desktop environments within sandboxes. It provides mouse, keyboard, screenshot, screen recording, and display operations for automating GUI interactions and testing desktop applications.

Computer Use and VNC work together to enable both manual and automated desktop interactions. VNC provides the visual interface for users to manually interact with the desktop, while Computer Use provides the programmatic API for AI agents to automate operations.

Computer Use is available for Linux. Windows and macOS support is currently in private alpha.

  • GUI application testing: automate interactions with native applications, click buttons, fill forms, and validate UI behavior
  • Visual testing & screenshots: capture screenshots of applications, compare UI states, and perform visual regression testing
  • Desktop automation: automate repetitive desktop tasks, file management through GUI, and complex workflows

Start Computer Use

Start all computer use processes (Xvfb, xfce4, x11vnc, novnc) in the Sandbox.

result = sandbox.computer_use.start()
print("Computer use processes started:", result.message)

For more information, see the Python SDK, TypeScript SDK, Ruby SDK, Go SDK, and API references:

start (Python SDK)

start (TypeScript SDK)

start (Ruby SDK)

Start Computer Use Processes (API)

Stop Computer Use

Stop all computer use processes in the Sandbox.

result = sandbox.computer_use.stop()
print("Computer use processes stopped:", result.message)

For more information, see the Python SDK, TypeScript SDK, Ruby SDK, Go SDK, and API references:

stop (Python SDK)

stop (TypeScript SDK)

stop (Ruby SDK)

Stop Computer Use Processes (API)

Get status

Get the status of all computer use processes.

response = sandbox.computer_use.get_status()
print("Computer use status:", response.status)

For more information, see the Python SDK, TypeScript SDK, Ruby SDK, Go SDK, and API references:

get_status (Python SDK)

getStatus (TypeScript SDK)

status (Ruby SDK)

Get Computer Use status (API)

Get process status

Get the status of a specific VNC process.

xvfb_status = sandbox.computer_use.get_process_status("xvfb")
novnc_status = sandbox.computer_use.get_process_status("novnc")

For more information, see the Python SDK, TypeScript SDK, Ruby SDK, and API references:

get_process_status (Python SDK)

getProcessStatus (TypeScript SDK)

get_process_status (Ruby SDK)

Get Process Status (API)

Restart process

Restart a specific VNC process.

result = sandbox.computer_use.restart_process("xfce4")
print("XFCE4 process restarted:", result.message)

For more information, see the Python SDK, TypeScript SDK, Ruby SDK, and API references:

restart_process (Python SDK)

restartProcess (TypeScript SDK)

restart_process (Ruby SDK)

Restart process (API)

Get process logs

Get logs for a specific VNC process.

logs = sandbox.computer_use.get_process_logs("novnc")
print("NoVNC logs:", logs)

For more information, see the Python SDK, TypeScript SDK, Ruby SDK, and API references:

get_process_logs (Python SDK)

getProcessLogs (TypeScript SDK)

get_process_logs (Ruby SDK)

Get process logs (API)

Get process errors

Get error logs for a specific VNC process.

errors = sandbox.computer_use.get_process_errors("x11vnc")
print("X11VNC errors:", errors)

For more information, see the Python SDK, TypeScript SDK, Ruby SDK, and API references:

get_process_errors (Python SDK)

getProcessErrors (TypeScript SDK)

get_process_errors (Ruby SDK)

Get process errors (API)

Mouse operations

Click

Click the mouse at the specified coordinates.

# Single left click
result = sandbox.computer_use.mouse.click(100, 200)
# Double click
double_click = sandbox.computer_use.mouse.click(100, 200, "left", True)
# Right click
right_click = sandbox.computer_use.mouse.click(100, 200, "right")

For more information, see the Python SDK, TypeScript SDK, Go SDK, and API references:

click (Python SDK)

click (TypeScript SDK)

Click (Go SDK)

Mouse click (API)

Move

Move the mouse cursor to the specified coordinates.

result = sandbox.computer_use.mouse.move(100, 200)
print(f"Mouse moved to: {result.x}, {result.y}")

For more information, see the Python SDK, TypeScript SDK, Go SDK, and API references:

move (Python SDK)

move (TypeScript SDK)

Move (Go SDK)

Mouse move (API)

Drag

Drag the mouse from start coordinates to end coordinates.

result = sandbox.computer_use.mouse.drag(50, 50, 150, 150)
print(f"Dragged from {result.from_x},{result.from_y} to {result.to_x},{result.to_y}")

For more information, see the Python SDK, TypeScript SDK, Go SDK, and API references:

drag (Python SDK)

drag (TypeScript SDK)

Drag (Go SDK)

Mouse drag (API)

Scroll

Scroll the mouse wheel at the specified coordinates.

# Scroll up
scroll_up = sandbox.computer_use.mouse.scroll(100, 200, "up", 3)
# Scroll down
scroll_down = sandbox.computer_use.mouse.scroll(100, 200, "down", 5)

For more information, see the Python SDK, TypeScript SDK, Go SDK, and API references:

scroll (Python SDK)

scroll (TypeScript SDK)

Scroll (Go SDK)

Mouse scroll (API)

Get position

Get the current mouse cursor position.

position = sandbox.computer_use.mouse.get_position()
print(f"Mouse is at: {position.x}, {position.y}")

For more information, see the Python SDK, TypeScript SDK, Go SDK, and API references:

get_position (Python SDK)

getPosition (TypeScript SDK)

GetPosition (Go SDK)

Get mouse position (API)

Keyboard operations

Type

Type the specified text.

sandbox.computer_use.keyboard.type("Hello, World!")
# With delay between characters
sandbox.computer_use.keyboard.type("Slow typing", 100)

For more information, see the Python SDK, TypeScript SDK, Go SDK, and API references:

type (Python SDK)

type (TypeScript SDK)

Type (Go SDK)

Keyboard type (API)

Press

Press a key with optional modifiers.

# Press Enter
sandbox.computer_use.keyboard.press("Return")
# Press Ctrl+C
sandbox.computer_use.keyboard.press("c", ["ctrl"])
# Press Ctrl+Shift+T
sandbox.computer_use.keyboard.press("t", ["ctrl", "shift"])

For more information, see the Python SDK, TypeScript SDK, Go SDK, and API references:

press (Python SDK)

press (TypeScript SDK)

Press (Go SDK)

Keyboard press (API)

Hotkey

Press a hotkey combination.

# Copy
sandbox.computer_use.keyboard.hotkey("ctrl+c")
# Paste
sandbox.computer_use.keyboard.hotkey("ctrl+v")
# Alt+Tab
sandbox.computer_use.keyboard.hotkey("alt+tab")

For more information, see the Python SDK, TypeScript SDK, Go SDK, and API references:

hotkey (Python SDK)

hotkey (TypeScript SDK)

Hotkey (Go SDK)

Keyboard hotkey (API)

Screenshot operations

Take full screen

Take a screenshot of the entire screen.

screenshot = sandbox.computer_use.screenshot.take_full_screen()
print(f"Screenshot size: {screenshot.width}x{screenshot.height}")
# With cursor visible
with_cursor = sandbox.computer_use.screenshot.take_full_screen(True)

For more information, see the Python SDK, TypeScript SDK, Go SDK, and API references:

take_full_screen (Python SDK)

takeFullScreen (TypeScript SDK)

TakeFullScreen (Go SDK)

Take screenshot (API)

Take region

Take a screenshot of a specific region.

from daytona import ScreenshotRegion
region = ScreenshotRegion(x=100, y=100, width=300, height=200)
screenshot = sandbox.computer_use.screenshot.take_region(region)
print(f"Captured region: {screenshot.region.width}x{screenshot.region.height}")

For more information, see the Python SDK, TypeScript SDK, Go SDK, and API references:

take_region (Python SDK)

takeRegion (TypeScript SDK)

TakeRegion (Go SDK)

Take Screenshot Region (API)

Take compressed

Take a compressed screenshot of the entire screen.

from daytona import ScreenshotOptions
# Default compression
screenshot = sandbox.computer_use.screenshot.take_compressed()
# High quality JPEG
jpeg = sandbox.computer_use.screenshot.take_compressed(
ScreenshotOptions(format="jpeg", quality=95, show_cursor=True)
)
# Scaled down PNG
scaled = sandbox.computer_use.screenshot.take_compressed(
ScreenshotOptions(format="png", scale=0.5)
)

For more information, see the Python SDK, TypeScript SDK, and API references:

take_compressed (Python SDK)

takeCompressed (TypeScript SDK)

Take compressed screenshot (API)

Take compressed region

Take a compressed screenshot of a specific region.

from daytona import ScreenshotRegion, ScreenshotOptions
region = ScreenshotRegion(x=0, y=0, width=800, height=600)
screenshot = sandbox.computer_use.screenshot.take_compressed_region(
region,
ScreenshotOptions(format="webp", quality=80, show_cursor=True)
)
print(f"Compressed size: {screenshot.size_bytes} bytes")

For more information, see the Python SDK, TypeScript SDK, and API references:

take_compressed_region (Python SDK)

takeCompressedRegion (TypeScript SDK)

Take compressed screenshot region (API)

Screen Recording

Computer Use supports screen recording capabilities, allowing you to capture desktop sessions for debugging, documentation, or automation workflows.

Configure Recording Directory

By default, recordings are saved to ~/.daytona/recordings. You can specify a custom directory by passing the DAYTONA_RECORDINGS_DIR environment variable when creating a sandbox:

from daytona import Daytona, CreateSandboxFromSnapshotParams
daytona = Daytona()
sandbox = daytona.create(
CreateSandboxFromSnapshotParams(
snapshot="daytonaio/sandbox:0.6.0",
name="my-sandbox",
env_vars={"DAYTONA_RECORDINGS_DIR": "/home/daytona/my-recordings"}
)
)

Start Recording

Start a new screen recording session with an optional name identifier:

# Start recording with a custom name
recording = sandbox.computer_use.recording.start("test-1")
print(f"Recording started: {recording.id}")
print(f"File path: {recording.file_path}")

For more information, see the Python SDK, TypeScript SDK, Ruby SDK, Go SDK, and API references:

start (Python SDK)

start (TypeScript SDK)

start (Ruby SDK)

Start (Go SDK)

Start Recording (API)

Stop Recording

Stop an active recording session by providing the recording ID:

# Stop the recording
stopped_recording = sandbox.computer_use.recording.stop(recording.id)
print(f"Recording stopped: {stopped_recording.duration_seconds} seconds")
print(f"Saved to: {stopped_recording.file_path}")

For more information, see the Python SDK, TypeScript SDK, Ruby SDK, Go SDK, and API references:

stop (Python SDK)

stop (TypeScript SDK)

stop (Ruby SDK)

Stop (Go SDK)

Stop Recording (API)

List Recordings

Get a list of all recordings in the sandbox:

recordings_list = sandbox.computer_use.recording.list()
print(f"Total recordings: {len(recordings_list.recordings)}")
for rec in recordings_list.recordings:
print(f"- {rec.name}: {rec.duration_seconds}s ({rec.file_size_bytes} bytes)")

For more information, see the Python SDK, TypeScript SDK, Ruby SDK, Go SDK, and API references:

list (Python SDK)

list (TypeScript SDK)

list (Ruby SDK)

List (Go SDK)

List Recordings (API)

Get Recording

Get details about a specific recording:

recording_detail = sandbox.computer_use.recording.get("recording-id")
print(f"Recording: {recording_detail.name}")
print(f"Status: {recording_detail.status}")
print(f"Duration: {recording_detail.duration_seconds}s")

For more information, see the Python SDK, TypeScript SDK, Ruby SDK, Go SDK, and API references:

get (Python SDK)

get (TypeScript SDK)

get (Ruby SDK)

Get (Go SDK)

Get Recording (API)

Delete Recording

Delete a recording by ID:

sandbox.computer_use.recording.delete("recording-id")
print("Recording deleted successfully")

For more information, see the Python SDK, TypeScript SDK, Ruby SDK, Go SDK, and API references:

delete (Python SDK)

delete (TypeScript SDK)

delete (Ruby SDK)

Delete (Go SDK)

Delete Recording (API)

Download Recording

Download a recording file from the sandbox to your local machine. The file is streamed efficiently without loading the entire content into memory, making it suitable for large recordings.

# Download recording to local file
sandbox.computer_use.recording.download(recording.id, "local_recording.mp4")
print("Recording downloaded successfully")
# Or with custom path
import os
download_path = os.path.join("recordings", f"recording_{recording.id}.mp4")
sandbox.computer_use.recording.download(recording.id, download_path)

For more information, see the Python SDK, TypeScript SDK, Ruby SDK, Go SDK, and API references:

download (Python SDK)

download (TypeScript SDK)

download (Ruby SDK)

Download (Go SDK)

Download Recording (API)

Recording Dashboard

Every sandbox includes a built-in recording dashboard for managing screen recordings through a web interface. The dashboard allows you to view, download, and delete recordings without writing code.

To access the recording dashboard:

  1. Navigate to your sandboxes in the Daytona Dashboard
  2. Click the action menu (three dots) for your sandbox
  3. Select Screen Recordings from the dropdown menu

The recording dashboard provides:

  • List of all recordings with metadata (name, duration, file size, creation time)
  • Playback controls for reviewing recordings
  • Download functionality to save recordings locally
  • Delete options for managing storage

Display operations

Get info

Get information about the displays.

info = sandbox.computer_use.display.get_info()
print(f"Primary display: {info.primary_display.width}x{info.primary_display.height}")
print(f"Total displays: {info.total_displays}")
for i, display in enumerate(info.displays):
print(f"Display {i}: {display.width}x{display.height} at {display.x},{display.y}")

For more information, see the Python SDK, TypeScript SDK, Go SDK, and API references:

get_info (Python SDK)

getInfo (TypeScript SDK)

GetInfo (Go SDK)

Get display info (API)

Get windows

Get the list of open windows.

windows = sandbox.computer_use.display.get_windows()
print(f"Found {windows.count} open windows:")
for window in windows.windows:
print(f"- {window.title} (ID: {window.id})")

For more information, see the Python SDK, TypeScript SDK, Go SDK, and API references:

get_windows (Python SDK)

getWindows (TypeScript SDK)

GetWindows (Go SDK)

Get Windows (API)