Runner Recovery & FUSE Optimization
This release improves runner reliability and resource efficiency. Sandboxes marked as recoverable on draining runners are now recovered in-place by expanding disk by 5% and resetting state, preventing infinite retry loops caused by disk-full conditions during migration.
FUSE volume mounts have been optimized to use a single mount per volume with bind subdirectories for subpaths, significantly reducing the number of mount-s3 processes and idle CPU usage on runners. The runner also replaces fragile log-parsing for snapshot pull detection with a thread-safe in-memory tracker.
Additional fixes include permission-gated sandbox creation on the dashboard, Python SDK delete timeouts, reduced API log spam on archive, and documentation for entrypoint session details.
Release Details
Features:
docs: add entrypoint session details (#4163)
Fixes:
api: recover recoverable sandboxes in-place when runner is draining (#4029)
api: reduce log spam on archive (#4152)
api: allow region resource access to sandbox (#4161)
api: prime last activity at (#4094)
api: guard for get single sandbox (#4165)
dashboard: check WRITE_SANDBOXES permission before rendering create sandbox button (#4162)
docs: README.md go example references wrong package for options (#4130)
python-sdk: delete with timeout (#4155)
runner: handle pulling snapshot on create (#4114)
runner: use single fuse mount per volume with bind subdirs for subpaths (#4047)
Chores: