Skip to content
View as Markdown

Desktop automation operations for a Sandbox.

Provides a Java facade for computer-use features including desktop session management, screenshots, mouse and keyboard automation, display/window inspection, and screen recording.

public ComputerUseStartResponse start()

Starts the computer-use desktop stack (VNC/noVNC and related processes).

Returns:

  • ComputerUseStartResponse - start response containing process status details
public ComputerUseStopResponse stop()

Stops all computer-use desktop processes.

Returns:

  • ComputerUseStopResponse - stop response containing process status details
public ComputerUseStatusResponse getStatus()

Returns current computer-use status.

Returns:

  • ComputerUseStatusResponse - overall computer-use status
public AccessibilityTreeResponse getAccessibilityTree()

Fetches the focused AT-SPI accessibility tree.

Returns:

  • AccessibilityTreeResponse - accessibility tree response
public AccessibilityTreeResponse getAccessibilityTree(String scope, Integer pid, Integer maxDepth)

Fetches an AT-SPI accessibility tree.

Parameters:

  • scope String - scope to inspect (focused, pid, or all)
  • pid Integer - process ID when scope is pid
  • maxDepth Integer - max tree depth (0 for root only)

Returns:

  • AccessibilityTreeResponse - accessibility tree response
public AccessibilityNodesResponse findAccessibilityNodes()

Finds AT-SPI accessibility nodes without filters.

Returns:

  • AccessibilityNodesResponse - matching accessibility nodes
public AccessibilityNodesResponse findAccessibilityNodes(FindAccessibilityNodesRequest request)

Finds AT-SPI accessibility nodes using a generated toolbox request.

Parameters:

  • request FindAccessibilityNodesRequest - generated accessibility find request

Returns:

  • AccessibilityNodesResponse - matching accessibility nodes
public void focusAccessibilityNode(String id)

Focuses an AT-SPI accessibility node.

Parameters:

  • id String - accessibility node ID
public void invokeAccessibilityNode(String id)

Invokes an AT-SPI accessibility node’s primary action.

Parameters:

  • id String - accessibility node ID
public void invokeAccessibilityNode(String id, String action)

Invokes an AT-SPI accessibility node action.

Parameters:

  • id String - accessibility node ID
  • action String - action name, or null for the primary action
public void setAccessibilityNodeValue(String id, String value)

Sets an AT-SPI accessibility node value.

Parameters:

  • id String - accessibility node ID
  • value String - value to write
public ScreenshotResponse takeScreenshot()

Captures a full-screen screenshot without cursor.

Returns:

  • ScreenshotResponse - screenshot payload (base64 image and metadata)
public ScreenshotResponse takeScreenshot(boolean showCursor)

Captures a full-screen screenshot.

Parameters:

  • showCursor boolean - whether to render cursor in the screenshot

Returns:

  • ScreenshotResponse - screenshot payload (base64 image and metadata)
public ScreenshotResponse takeRegionScreenshot(int x, int y, int width, int height)

Captures a screenshot of a rectangular region without cursor.

Parameters:

  • x int - region top-left X coordinate
  • y int - region top-left Y coordinate
  • width int - region width in pixels
  • height int - region height in pixels

Returns:

  • ScreenshotResponse - region screenshot payload
public ScreenshotResponse takeCompressedScreenshot(String format, int quality, double scale)

Captures a compressed full-screen screenshot.

Parameters:

  • format String - output image format (for example: png, jpeg, webp)
  • quality int - compression quality (typically 1-100, format dependent)
  • scale double - screenshot scale factor (for example: 0.5 for 50%)

Returns:

  • ScreenshotResponse - compressed screenshot payload
public MouseClickResponse click(int x, int y)

Performs a left mouse click at the given coordinates.

Parameters:

  • x int - target X coordinate
  • y int - target Y coordinate

Returns:

  • MouseClickResponse - click response with resulting cursor position
public MouseClickResponse click(int x, int y, String button)

Performs a mouse click at the given coordinates with a specific button.

Parameters:

  • x int - target X coordinate
  • y int - target Y coordinate
  • button String - button type (left, right, middle)

Returns:

  • MouseClickResponse - click response with resulting cursor position
public MouseClickResponse doubleClick(int x, int y)

Performs a double left-click at the given coordinates.

Parameters:

  • x int - target X coordinate
  • y int - target Y coordinate

Returns:

  • MouseClickResponse - click response with resulting cursor position
public MousePositionResponse moveMouse(int x, int y)

Moves the mouse cursor to the given coordinates.

Parameters:

  • x int - target X coordinate
  • y int - target Y coordinate

Returns:

  • MousePositionResponse - new mouse position
public MousePositionResponse getMousePosition()

Returns current mouse position.

Returns:

  • MousePositionResponse - current mouse cursor coordinates
public MouseDragResponse drag(int startX, int startY, int endX, int endY)

Drags the mouse from one point to another using the left button.

Parameters:

  • startX int - drag start X coordinate
  • startY int - drag start Y coordinate
  • endX int - drag end X coordinate
  • endY int - drag end Y coordinate

Returns:

  • MouseDragResponse - drag response with resulting cursor position
public ScrollResponse scroll(int x, int y, int deltaX, int deltaY)

Scrolls at the given coordinates.

The current toolbox API supports directional scrolling (up/down) with an amount. This method maps deltaY to vertical scroll direction and magnitude. If deltaY is 0, deltaX is used as a fallback.

Parameters:

  • x int - anchor X coordinate
  • y int - anchor Y coordinate
  • deltaX int - horizontal delta (used only when deltaY == 0)
  • deltaY int - vertical delta

Returns:

  • ScrollResponse - scroll response indicating operation success
public void typeText(String text)

Types text using keyboard automation.

Parameters:

  • text String - text to type
public void pressKey(String key)

Presses a single key.

Parameters:

  • key String - key to press. Canonical names include enter, escape, tab, letters, digits, unshifted punctuation, function keys, and grammar-safe numpad names such as num_plus. Named keys are case-insensitive, and common aliases such as Return and Escape are normalized.
public void pressHotkey(String... keys)

Presses a key combination as a hotkey sequence.

Keys are joined with + before being sent (for example, pressHotkey("ctrl", "shift", "t") -> "ctrl+shift+t"). The resulting value is a single atomic chord and uses the same normalized key contract as #pressKey(String).

Parameters:

  • keys String… - hotkey parts to combine
public DisplayInfoResponse getDisplayInfo()

Returns display configuration information.

Returns:

  • DisplayInfoResponse - display information including available displays and their geometry
public WindowsResponse getWindows()

Returns currently open windows.

Returns:

  • WindowsResponse - window list and metadata
public Recording startRecording()

Starts a recording with default options.

Returns:

  • Recording - newly started recording metadata
public Recording startRecording(String label)

Starts a recording with an optional label.

Parameters:

  • label String - optional recording label

Returns:

  • Recording - newly started recording metadata
public Recording stopRecording(String id)

Stops an active recording.

Parameters:

  • id String - recording identifier

Returns:

  • Recording - finalized recording metadata
public ListRecordingsResponse listRecordings()

Lists all recordings for the current sandbox session.

Returns:

  • ListRecordingsResponse - recordings list response
public Recording getRecording(String id)

Returns metadata for a specific recording.

Parameters:

  • id String - recording identifier

Returns:

  • Recording - recording details
public File downloadRecording(String id)

Downloads a recording file.

Parameters:

  • id String - recording identifier

Returns:

  • File - downloaded temporary/local file handle returned by the API client
public void deleteRecording(String id)

Deletes a recording.

Parameters:

  • id String - recording identifier