Open Source
MIT-licensed core — Cua Driver, Cua Sandbox SDK, Cua Bench, and Lume. Self-hosted on your own infrastructure.
Open-source infrastructure for computer-use agents — sandboxes, SDKs, benchmarks, and background desktop drivers.
Cua is an open-source (MIT) infrastructure platform for building, benchmarking, and deploying computer-use agents. It provides background desktop control via Cua Driver, ephemeral multi-OS sandboxes via Cua Sandbox, standardized benchmarks via Cua Bench, and macOS virtualization via Lume — all accessible through a unified Python/TypeScript SDK, MCP server, and CLI.
MIT-licensed core — Cua Driver, Cua Sandbox SDK, Cua Bench, and Lume. Self-hosted on your own infrastructure.
Hosted sandbox fleets on cua.ai with warm pools, multi-OS support (Linux, Windows, macOS, Android), and snapshot-based parallelism. BYOC and on-prem available.
Curated, human-reviewed trajectory datasets generated from verified rollouts across all four OS families for training computer-use models.
Cua is the first comprehensive open-source infrastructure layer purpose-built for computer-use agents — AI systems that see a screen, reason about it, and interact with desktop applications the way a human would. Developed by Cua AI, Inc. ( Y Combinator X25 ) and released under the MIT license , Cua has grown to over 18,000 GitHub stars, 3,500+ commits, and 521 releases since its founding in March 2025.
The platform fills a critical gap in the AI agent stack. While vision-language models have become increasingly capable of understanding screenshots and generating actions, the infrastructure to actually run those actions on real desktops — in the background, across operating systems, at scale — was fragmented or proprietary. Cua provides the missing layer: background desktop drivers, ephemeral sandboxes, standardized benchmarks, and macOS virtualization, all accessible through a unified Python and TypeScript SDK, an MCP server, and a CLI.
Cua is used by over 50,000 engineers at companies including Google, Meta, Apple, and NVIDIA for training, evaluation, and deployment of computer-use agents.
Cua is not a single tool but a platform of interoperable components. You enter through one of four products:
Cua Driver is the entry point for giving an existing coding agent the ability to control a desktop computer. It runs as a background daemon on macOS (stable) and Windows (stable), with Linux in pre-release.
Agents click, type, scroll, inspect accessibility trees, and capture window state — all without stealing the cursor or focus from the user.
The driver exposes a single MCP stdio server and a CLI that call the same driver handlers. This means any MCP-capable client — Claude Code, Codex, Cursor, Hermes, OpenClaw, OpenCode — can wire it up in minutes:
# macOS / Linux
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/cua-driver/scripts/install.sh)"
# Then connect to Claude Code
claude mcp add --transport stdio cua-driver -- cua-driver mcp Cua Driver is built on native platform APIs — Accessibility (AX) on macOS, UI Automation on Windows — and uses SkyLight internals on macOS for multi-cursor support. Each agent session gets its own synthetic cursor, and the driver orchestrates per-window element indexes that refresh on every snapshot, so pre-action and post-action window state are part of the contract.
Key capabilities:
launch_app — Start an application by bundle ID or pathclick / double_click / right_click — Click by element index or coordinatestype_text — Send keystrokes to a focused elementscroll — Scroll in any directionget_window_state — Returns accessibility tree + screenshot of the current windowdrag / held_button — Mouse drag and held-button interactions (Linux pre-release)Cua Sandbox provides ephemeral desktop environments for agents to run in. One Python or TypeScript API boots Linux, Windows, macOS, or Android environments — locally via QEMU or in the cloud via Cua Cloud:
from cua import Sandbox, Image
# Same API regardless of OS or runtime
async with Sandbox.ephemeral(Image.linux()) as sb:
result = await sb.shell.run("echo hello")
screenshot = await sb.screenshot()
await sb.mouse.click(100, 200)
await sb.keyboard.type("Hello from Cua!")
await sb.mobile.gesture((100, 500), (100, 200)) Sandboxes support snapshot-native rollouts — fork known machine states over copy-on-write storage, run parallel episodes, and reproduce failures without rebuilding from scratch. Cua Run extends this to elastic infrastructure: claim pre-booted machines from formally verified warm pools in milliseconds, scaling to zero when idle.
| Runtime | Linux | Windows | macOS | Android | BYOI |
|---|---|---|---|---|---|
| Cloud (cua.ai) | ✅ | ✅ | ✅ | ✅ | 🔜 |
| Local (QEMU) | ✅ | ✅ | ✅ | ✅ | ✅ |
Cua Bench is a standardized evaluation framework for computer-use agents. It supports running agents against established benchmarks — OSWorld, ScreenSpot, Windows Arena — and custom task definitions:
# Clone, install, and create base image
git clone https://github.com/trycua/cua && cd cua/cua-bench
uv tool install -e . && cb image create linux-docker
# Run benchmark with agent
cb run dataset datasets/cua-bench-basic --agent cua-agent --max-parallel 4 Cua Bench also provides RL environments for training computer-use models, with trajectory export for downstream training pipelines. The cuabench.ai registry hosts community-contributed tasks and leaderboards.
Lume is a CLI tool for creating and managing macOS and Linux VMs on Apple Silicon with near-native performance, built on Apple's Virtualization.Framework . It predates Apple's Containerization announcement and was Cua's original open-source project:
# Install Lume
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
# Pull & start a macOS VM
lume run macos-sequoia-vanilla:latest Lume supports restoring from IPSW files, Resizeable Raw Disk Images, and a Docker-compatible interface called Lumier for orchestrating multiple VMs.
Cua offers a Verified Data pipeline: human-reviewed trajectory datasets generated from verified agent rollouts across all four OS families. Every trajectory is scored against an evaluator, accepted runs pass human QA, and golden trajectories carry step-level annotations for training computer-use models.
Cua's architecture is modular by design — each component can be used independently or composed together:
┌─────────────────────────────────────────────────────────────┐
│ Coding Agents │
│ Claude Code · Codex · Cursor · Hermes · OpenClaw · ... │
└────────────────────────┬────────────────────────────────────┘
│ MCP / HTTP
┌────────────────────────▼────────────────────────────────────┐
│ Cua Driver │
│ Background desktop control (macOS / Windows / Linux) │
│ Accessibility · SkyLight · UI Automation │
└────────────────────────┬────────────────────────────────────┘
│
┌────────────────────────▼────────────────────────────────────┐
│ Cua Sandbox + Cua Run │
│ Ephemeral VMs/containers — Linux · Windows · macOS · Android │
│ Local (QEMU) · Cloud (cua.ai) · BYOC · On-prem │
│ Warm pools · Snapshots · Copy-on-write │
└────────────────────────┬────────────────────────────────────┘
│
┌────────────────────────▼────────────────────────────────────┐
│ Cua Bench │ Lume │
│ Benchmarks + RL │ macOS VMs on Apple Silicon │
│ OSWorld · ScreenSpot │ Virtualization.Framework │
│ Windows Arena │ Lumier (Docker-compatible) │
└────────────────────────┴────────────────────────────────────┘ The stack is unified by the Python SDK (pip install cua) and the TypeScript SDK (@trycua/cua), with MCP as the universal protocol layer connecting coding agents to the driver.
Cua Driver's MCP server turns any MCP-capable coding agent into a computer-use agent with a single command.
# Standard MCP — agent sees accessibility tree + tools
claude mcp add --transport stdio cua-driver -- cua-driver mcp
# Screenshot-grounded mode — Claude Code computer-use compatible
claude mcp add --transport stdio cua-computer-use -- cua-driver mcp --claude-code-computer-use-compat codex mcp add cua-driver -- cua-driver mcp # Generate client-specific config
cua-driver mcp-config --client cursor
cua-driver mcp-config --client opencode
# Generic MCP server shape
cua-driver mcp-config cua-driver mcp-config --client hermes
# paste into ~/.hermes/config.yaml Any coding agent workflow extends to desktop automation — filling forms in native apps, exporting data from enterprise software, testing GUI applications, and automating multi-step workflows — all from the same chat interface the developer already uses.
Cua provides the full pipeline: Cua Sandbox for environment orchestration at scale, Cua Bench for standardized evaluation, and Verified Data for human-reviewed training trajectories. Research teams at Google, Meta, and NVIDIA use Cua to generate training data for their computer-use vision-language models.
QA teams can deploy Cua Driver in CI/CD pipelines to run GUI tests against native desktop applications. The driver's background execution means tests run without interfering with other work on the same machine, and snapshots provide deterministic replay of failures.
Cua Driver automates repetitive desktop workflows — data entry across legacy applications, software installation and configuration, file management across network drives — with the reliability of accessibility-tree-grounded interaction rather than fragile pixel coordinates.
Cua Run's warm pools and session-identity system enable multiple agents to work in parallel on the same or different machines. Each agent gets its own cursor and window session, enabling coordinated multi-agent desktop automation.
Lume provides near-native macOS VMs for CI/CD pipelines — build and test macOS, iOS, and visionOS applications without dedicated Mac hardware. Combined with Cua's sandbox API, teams can spin up macOS environments on demand for testing.
| Tier | Price | Description |
|---|---|---|
| Open Source | $0 | MIT-licensed core — Cua Driver, Cua Sandbox SDK, Cua Bench, and Lume. Self-hosted on your infrastructure. |
| Cloud | By request | Hosted sandbox fleets with warm pools, multi-OS support (Linux, Windows, macOS, Android), and snapshot-based parallelism. BYOC and on-prem available. |
| Verified Data | By request | Curated, human-reviewed trajectory datasets from verified rollouts across all four OS families. |
The entire core platform is MIT-licensed and available on GitHub . Cloud infrastructure and verified data are available by contacting the Cua team, with options for SOC 2-ready deployment, BYOC, and on-premises hosting.
# Install on macOS or Windows
# macOS / Linux
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/cua-driver/scripts/install.sh)"
# Windows (PowerShell as admin)
irm https://raw.githubusercontent.com/trycua/cua/main/libs/cua-driver/scripts/install.ps1 | iex
# Start the daemon
open -n -g -a CuaDriver --args serve
cua-driver status
# Wire into Claude Code
claude mcp add --transport stdio cua-driver -- cua-driver mcp
# Launch an app and interact
cua-driver call launch_app '{"bundle_id":"com.apple.calculator"}'
cua-driver call get_window_state '{"pid":844,"window_id":10725}'
cua-driver call click '{"pid":844,"window_id":10725,"element_index":14}' pip install cua from cua import Sandbox, Image
async with Sandbox.ephemeral(Image.linux()) as sb:
print(await sb.shell.run("uname -a"))
await sb.mouse.click(500, 400)
await sb.keyboard.type("Hello from Cua!")
screenshot = await sb.screenshot()
# screenshot is a PIL Image — save, display, or send to a VLM git clone https://github.com/trycua/cua && cd cua/cua-bench
uv tool install -e .
cb image create linux-docker
cb run dataset datasets/cua-bench-basic --agent cua-agent /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
lume run macos-sequoia-vanilla:latest Latest — maintenance release with dependency updates
Fix MCP routing at both /mcp and /mcp/ paths, route single printable keypress through type_text
Rust driver release — Linux background drag and held-button tools, show cursor and type in background terminals
macOS VM management — bump with stability improvements
Caller-declared session identity + Streamable-HTTP transport for multi-agent parallelism
Install Cua Driver on macOS and connect it to Claude Code via MCP: 'claude mcp add --transport stdio cua-driver -- cua-driver mcp'. Then inside Claude Code, type: 'Open the Calculator app, compute 15% of 340, and paste the result into a new TextEdit document.' Cua Driver drives the desktop apps in the background — clicking, typing, and verifying via accessibility trees and screenshots — without stealing cursor focus or interrupting your workflow. Paca is a free, open-source, self-hosted Scrum board where AI agents work as equal teammates — assigned to sprints, picking up tasks, and collaborating on BDD specs alongside humans. Built as an alternative to Jira and Linear, it treats AI agents as first-class Scrum members.
Nanobot is an ultra-lightweight, open-source (MIT) personal AI agent that ships with WebUI, multi-channel chat (Telegram, Discord, WeChat, Slack, Feishu, email), MCP support, memory, model routing with fallbacks, cron automation, and a plugin skill system — all pip-installable in seconds. Built on a deliberately small and readable Python core, it lets you truly own your AI agent stack.
AionUi is a free, open-source Cowork desktop app that runs Claude Code, Codex, Gemini CLI, OpenClaw, Hermes Agent, and 20+ more coding agents side-by-side within a unified Electron interface. It auto-detects installed CLI agents, provides a built-in AI engine with 30+ provider support, and adds Team Mode for multi-agent orchestration — all from one desktop workspace.