Cua

Open-source infrastructure for computer-use agents — sandboxes, SDKs, benchmarks, and background desktop drivers.

Cua AI, Inc. (YC X25) Open source Since 2025

Cua is an open-source (MIT) infrastructure platform for building, benchmarking, and deploying computer-use agents. It provides background desktop control via Cua Driver, ephemeral multi-OS sandboxes via Cua Sandbox, standardized benchmarks via Cua Bench, and macOS virtualization via Lume — all accessible through a unified Python/TypeScript SDK, MCP server, and CLI.

+ Pros

First comprehensive open-source infrastructure layer for computer-use agents — fills the gap between vision models and real desktop control
Background desktop control on macOS and Windows without stealing cursor or focus — agents click, type, and verify invisibly
MCP server enables drop-in integration with existing coding agents (Claude Code, Codex, Cursor, Hermes, OpenCode) in minutes
Unified multi-OS sandbox API — one SDK to boot Linux, Windows, macOS, and Android environments locally or in the cloud
Cua Bench provides standardized benchmarks (OSWorld, ScreenSpot, Windows Arena) and RL environments for reproducible evaluation
Lume delivers near-native macOS VM performance on Apple Silicon using Apple's Virtualization.Framework
18,000+ GitHub stars, 3,500+ commits, 521 releases — exceptionally active development with rapid iteration
MIT licensed — fully open-source core with self-hosted and on-prem deployment options
Verified data pipeline delivers human-reviewed trajectory datasets for training computer-use models

− Cons

Cloud pricing is opaque — no public self-serve tiers, only 'request access' for hosted fleets and dedicated infrastructure
Linux Cua Driver is pre-release only — not yet stable for production Linux desktop automation
Early-stage company (founded March 2025) — rapidly evolving APIs and product surface may shift unexpectedly
Complex product surface (Driver, Sandbox, Run, Bench, Lume, Verified Data) can be confusing to navigate for new users
Cua Driver requires macOS or Windows host — no self-hosted driver option on Linux yet
Some components (Cua Run, Verified Data) are cloud-only with no self-hosted alternative
Heavy dependency on Apple ecosystem for macOS sandboxing — Lume requires Apple Silicon hardware

Pricing

Open Source

MIT-licensed core — Cua Driver, Cua Sandbox SDK, Cua Bench, and Lume. Self-hosted on your own infrastructure.

Cloud

Request

Hosted sandbox fleets on cua.ai with warm pools, multi-OS support (Linux, Windows, macOS, Android), and snapshot-based parallelism. BYOC and on-prem available.

Verified Data

Request

Curated, human-reviewed trajectory datasets generated from verified rollouts across all four OS families for training computer-use models.

Introduction

Cua is the first comprehensive open-source infrastructure layer purpose-built for computer-use agents — AI systems that see a screen, reason about it, and interact with desktop applications the way a human would. Developed by Cua AI, Inc. ( Y Combinator X25 ) and released under the MIT license , Cua has grown to over 18,000 GitHub stars, 3,500+ commits, and 521 releases since its founding in March 2025.

The platform fills a critical gap in the AI agent stack. While vision-language models have become increasingly capable of understanding screenshots and generating actions, the infrastructure to actually run those actions on real desktops — in the background, across operating systems, at scale — was fragmented or proprietary. Cua provides the missing layer: background desktop drivers, ephemeral sandboxes, standardized benchmarks, and macOS virtualization, all accessible through a unified Python and TypeScript SDK, an MCP server, and a CLI.

Cua is used by over 50,000 engineers at companies including Google, Meta, Apple, and NVIDIA for training, evaluation, and deployment of computer-use agents.

Key Components

Cua is not a single tool but a platform of interoperable components. You enter through one of four products:

Cua Driver — Background Desktop Control

Cua Driver is the entry point for giving an existing coding agent the ability to control a desktop computer. It runs as a background daemon on macOS (stable) and Windows (stable), with Linux in pre-release.

Agents click, type, scroll, inspect accessibility trees, and capture window state — all without stealing the cursor or focus from the user.

The driver exposes a single MCP stdio server and a CLI that call the same driver handlers. This means any MCP-capable client — Claude Code, Codex, Cursor, Hermes, OpenClaw, OpenCode — can wire it up in minutes:

# macOS / Linux
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/cua-driver/scripts/install.sh)"

# Then connect to Claude Code
claude mcp add --transport stdio cua-driver -- cua-driver mcp

Cua Driver is built on native platform APIs — Accessibility (AX) on macOS, UI Automation on Windows — and uses SkyLight internals on macOS for multi-cursor support. Each agent session gets its own synthetic cursor, and the driver orchestrates per-window element indexes that refresh on every snapshot, so pre-action and post-action window state are part of the contract.

Key capabilities:

launch_app — Start an application by bundle ID or path
click / double_click / right_click — Click by element index or coordinates
type_text — Send keystrokes to a focused element
scroll — Scroll in any direction
get_window_state — Returns accessibility tree + screenshot of the current window
drag / held_button — Mouse drag and held-button interactions (Linux pre-release)

Cua Sandbox — Agent-Ready Environments

Cua Sandbox provides ephemeral desktop environments for agents to run in. One Python or TypeScript API boots Linux, Windows, macOS, or Android environments — locally via QEMU or in the cloud via Cua Cloud:

from cua import Sandbox, Image

# Same API regardless of OS or runtime
async with Sandbox.ephemeral(Image.linux()) as sb:
    result = await sb.shell.run("echo hello")
    screenshot = await sb.screenshot()
    await sb.mouse.click(100, 200)
    await sb.keyboard.type("Hello from Cua!")
    await sb.mobile.gesture((100, 500), (100, 200))

Sandboxes support snapshot-native rollouts — fork known machine states over copy-on-write storage, run parallel episodes, and reproduce failures without rebuilding from scratch. Cua Run extends this to elastic infrastructure: claim pre-booted machines from formally verified warm pools in milliseconds, scaling to zero when idle.

Runtime	Linux	Windows	macOS	Android	BYOI
Cloud (cua.ai)	✅	✅	✅	✅	🔜
Local (QEMU)	✅	✅	✅	✅	✅

Cua Bench — Benchmarks & RL Environments

Cua Bench is a standardized evaluation framework for computer-use agents. It supports running agents against established benchmarks — OSWorld, ScreenSpot, Windows Arena — and custom task definitions:

# Clone, install, and create base image
git clone https://github.com/trycua/cua && cd cua/cua-bench
uv tool install -e . && cb image create linux-docker

# Run benchmark with agent
cb run dataset datasets/cua-bench-basic --agent cua-agent --max-parallel 4

Cua Bench also provides RL environments for training computer-use models, with trajectory export for downstream training pipelines. The cuabench.ai registry hosts community-contributed tasks and leaderboards.

Lume — macOS Virtualization

Lume is a CLI tool for creating and managing macOS and Linux VMs on Apple Silicon with near-native performance, built on Apple's Virtualization.Framework . It predates Apple's Containerization announcement and was Cua's original open-source project:

# Install Lume
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"

# Pull & start a macOS VM
lume run macos-sequoia-vanilla:latest

Lume supports restoring from IPSW files, Resizeable Raw Disk Images, and a Docker-compatible interface called Lumier for orchestrating multiple VMs.

Verified Data

Cua offers a Verified Data pipeline: human-reviewed trajectory datasets generated from verified agent rollouts across all four OS families. Every trajectory is scored against an evaluator, accepted runs pass human QA, and golden trajectories carry step-level annotations for training computer-use models.

Architecture

Cua's architecture is modular by design — each component can be used independently or composed together:

┌─────────────────────────────────────────────────────────────┐
│                      Coding Agents                          │
│   Claude Code · Codex · Cursor · Hermes · OpenClaw · ...   │
└────────────────────────┬────────────────────────────────────┘
                         │ MCP / HTTP
┌────────────────────────▼────────────────────────────────────┐
│                     Cua Driver                              │
│      Background desktop control (macOS / Windows / Linux)    │
│         Accessibility · SkyLight · UI Automation            │
└────────────────────────┬────────────────────────────────────┘
                         │
┌────────────────────────▼────────────────────────────────────┐
│                 Cua Sandbox + Cua Run                        │
│   Ephemeral VMs/containers — Linux · Windows · macOS · Android │
│   Local (QEMU) · Cloud (cua.ai) · BYOC · On-prem             │
│   Warm pools · Snapshots · Copy-on-write                     │
└────────────────────────┬────────────────────────────────────┘
                         │
┌────────────────────────▼────────────────────────────────────┐
│     Cua Bench          │          Lume                       │
│   Benchmarks + RL      │   macOS VMs on Apple Silicon        │
│   OSWorld · ScreenSpot │   Virtualization.Framework          │
│   Windows Arena        │   Lumier (Docker-compatible)        │
└────────────────────────┴────────────────────────────────────┘

The stack is unified by the Python SDK (pip install cua) and the TypeScript SDK (@trycua/cua), with MCP as the universal protocol layer connecting coding agents to the driver.

Integration with Coding Agents

Cua Driver's MCP server turns any MCP-capable coding agent into a computer-use agent with a single command.

Claude Code

# Standard MCP — agent sees accessibility tree + tools
claude mcp add --transport stdio cua-driver -- cua-driver mcp

# Screenshot-grounded mode — Claude Code computer-use compatible
claude mcp add --transport stdio cua-computer-use -- cua-driver mcp --claude-code-computer-use-compat

Codex

codex mcp add cua-driver -- cua-driver mcp

Cursor / OpenCode / Custom MCP Clients

# Generate client-specific config
cua-driver mcp-config --client cursor
cua-driver mcp-config --client opencode

# Generic MCP server shape
cua-driver mcp-config

Hermes

cua-driver mcp-config --client hermes
# paste into ~/.hermes/config.yaml

Any coding agent workflow extends to desktop automation — filling forms in native apps, exporting data from enterprise software, testing GUI applications, and automating multi-step workflows — all from the same chat interface the developer already uses.

Use Cases

Training Computer-Use Models

Cua provides the full pipeline: Cua Sandbox for environment orchestration at scale, Cua Bench for standardized evaluation, and Verified Data for human-reviewed training trajectories. Research teams at Google, Meta, and NVIDIA use Cua to generate training data for their computer-use vision-language models.

Automated GUI Testing

QA teams can deploy Cua Driver in CI/CD pipelines to run GUI tests against native desktop applications. The driver's background execution means tests run without interfering with other work on the same machine, and snapshots provide deterministic replay of failures.

Enterprise Desktop Automation

Cua Driver automates repetitive desktop workflows — data entry across legacy applications, software installation and configuration, file management across network drives — with the reliability of accessibility-tree-grounded interaction rather than fragile pixel coordinates.

Multi-Agent Parallel Workflows

Cua Run's warm pools and session-identity system enable multiple agents to work in parallel on the same or different machines. Each agent gets its own cursor and window session, enabling coordinated multi-agent desktop automation.

macOS CI/CD for Apple Ecosystem

Lume provides near-native macOS VMs for CI/CD pipelines — build and test macOS, iOS, and visionOS applications without dedicated Mac hardware. Combined with Cua's sandbox API, teams can spin up macOS environments on demand for testing.

Pricing

Tier	Price	Description
Open Source	$0	MIT-licensed core — Cua Driver, Cua Sandbox SDK, Cua Bench, and Lume. Self-hosted on your infrastructure.
Cloud	By request	Hosted sandbox fleets with warm pools, multi-OS support (Linux, Windows, macOS, Android), and snapshot-based parallelism. BYOC and on-prem available.
Verified Data	By request	Curated, human-reviewed trajectory datasets from verified rollouts across all four OS families.

The entire core platform is MIT-licensed and available on GitHub . Cloud infrastructure and verified data are available by contacting the Cua team, with options for SOC 2-ready deployment, BYOC, and on-premises hosting.

Getting Started

Try Cua Driver (quickest start)

# Install on macOS or Windows
# macOS / Linux
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/cua-driver/scripts/install.sh)"

# Windows (PowerShell as admin)
irm https://raw.githubusercontent.com/trycua/cua/main/libs/cua-driver/scripts/install.ps1 | iex

# Start the daemon
open -n -g -a CuaDriver --args serve
cua-driver status

# Wire into Claude Code
claude mcp add --transport stdio cua-driver -- cua-driver mcp

# Launch an app and interact
cua-driver call launch_app '{"bundle_id":"com.apple.calculator"}'
cua-driver call get_window_state '{"pid":844,"window_id":10725}'
cua-driver call click '{"pid":844,"window_id":10725,"element_index":14}'

Try Cua Sandbox (Python)

pip install cua

from cua import Sandbox, Image

async with Sandbox.ephemeral(Image.linux()) as sb:
    print(await sb.shell.run("uname -a"))
    await sb.mouse.click(500, 400)
    await sb.keyboard.type("Hello from Cua!")
    screenshot = await sb.screenshot()
    # screenshot is a PIL Image — save, display, or send to a VLM

Try Cua Bench

git clone https://github.com/trycua/cua && cd cua/cua-bench
uv tool install -e .
cb image create linux-docker
cb run dataset datasets/cua-bench-basic --agent cua-agent

Try Lume

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
lume run macos-sequoia-vanilla:latest

Version History

cua-computer-server v0.3.41 Jun 15, 2026

Latest — maintenance release with dependency updates

cua-computer-server v0.3.40 Jun 12, 2026

Fix MCP routing at both /mcp and /mcp/ paths, route single printable keypress through type_text

cua-driver-rs v0.5.3 Jun 12, 2026

Rust driver release — Linux background drag and held-button tools, show cursor and type in background terminals

lume v0.3.10 Jun 8, 2026

macOS VM management — bump with stability improvements

cua-driver-rs v0.5.0 Jun 1, 2026

Caller-declared session identity + Streamable-HTTP transport for multi-agent parallelism

Signature Snippet

Install Cua Driver on macOS and connect it to Claude Code via MCP: 'claude mcp add --transport stdio cua-driver -- cua-driver mcp'. Then inside Claude Code, type: 'Open the Calculator app, compute 15% of 340, and paste the result into a new TextEdit document.' Cua Driver drives the desktop apps in the background — clicking, typing, and verifying via accessibility trees and screenshots — without stealing cursor focus or interrupting your workflow.

More in this Space

Vix

Open source

Vix is a Go-native, open-source (AGPL-3.0) AI coding agent that slashes token costs by 40-50% using a stem agent architecture and Tree-sitter virtual filesystem. It rethinks the plan/execute loop — keeping LLM cache warm across Explore, Plan, and Execute phases — while shipping Programmable Workflows, Whiteboard Mode with voice AI, MCP server support, and a self-evolving agent that writes its own scheduled jobs and watchers.

Late — High-Leverage AI Agent Orchestration

Closed source

Orchestrate an entire AI dev team on 5GB VRAM using ephemeral subagents, exact-match diffs, and a zero-dependency Go binary. Works with any OpenAI-compatible model — local or cloud.

Paca

Open source

Paca is a free, open-source, self-hosted Scrum board where AI agents work as equal teammates — assigned to sprints, picking up tasks, and collaborating on BDD specs alongside humans. Built as an alternative to Jira and Linear, it treats AI agents as first-class Scrum members.

Cua

+ Pros

− Cons

Pricing

Open Source

Cloud

Verified Data

Introduction

Key Components

Cua Driver — Background Desktop Control

Cua Sandbox — Agent-Ready Environments

Cua Bench — Benchmarks & RL Environments

Lume — macOS Virtualization

Verified Data

Architecture

Integration with Coding Agents

Claude Code

Codex

Cursor / OpenCode / Custom MCP Clients

Hermes

Use Cases

Training Computer-Use Models

Automated GUI Testing

Enterprise Desktop Automation

Multi-Agent Parallel Workflows

macOS CI/CD for Apple Ecosystem

Pricing

Getting Started

Try Cua Driver (quickest start)

Try Cua Sandbox (Python)

Try Cua Bench

Try Lume

Further Reading

Version History

More in this Space

Vix

Late — High-Leverage AI Agent Orchestration

Paca

Track the tools. Lead the shift.