Type to search across all content

    Cua

    Open-source infrastructure for computer-use agents — sandboxes, SDKs, benchmarks, and background desktop drivers.

    Cua AI, Inc. (YC X25) Open source Since

    Cua is an open-source (MIT) infrastructure platform for building, benchmarking, and deploying computer-use agents. It provides background desktop control via Cua Driver, ephemeral multi-OS sandboxes via Cua Sandbox, standardized benchmarks via Cua Bench, and macOS virtualization via Lume — all accessible through a unified Python/TypeScript SDK, MCP server, and CLI.

    + Pros

    • First comprehensive open-source infrastructure layer for computer-use agents — fills the gap between vision models and real desktop control
    • Background desktop control on macOS and Windows without stealing cursor or focus — agents click, type, and verify invisibly
    • MCP server enables drop-in integration with existing coding agents (Claude Code, Codex, Cursor, Hermes, OpenCode) in minutes
    • Unified multi-OS sandbox API — one SDK to boot Linux, Windows, macOS, and Android environments locally or in the cloud
    • Cua Bench provides standardized benchmarks (OSWorld, ScreenSpot, Windows Arena) and RL environments for reproducible evaluation
    • Lume delivers near-native macOS VM performance on Apple Silicon using Apple's Virtualization.Framework
    • 18,000+ GitHub stars, 3,500+ commits, 521 releases — exceptionally active development with rapid iteration
    • MIT licensed — fully open-source core with self-hosted and on-prem deployment options
    • Verified data pipeline delivers human-reviewed trajectory datasets for training computer-use models

    Cons

    • Cloud pricing is opaque — no public self-serve tiers, only 'request access' for hosted fleets and dedicated infrastructure
    • Linux Cua Driver is pre-release only — not yet stable for production Linux desktop automation
    • Early-stage company (founded March 2025) — rapidly evolving APIs and product surface may shift unexpectedly
    • Complex product surface (Driver, Sandbox, Run, Bench, Lume, Verified Data) can be confusing to navigate for new users
    • Cua Driver requires macOS or Windows host — no self-hosted driver option on Linux yet
    • Some components (Cua Run, Verified Data) are cloud-only with no self-hosted alternative
    • Heavy dependency on Apple ecosystem for macOS sandboxing — Lume requires Apple Silicon hardware

    Pricing

    Open Source

    $0

    MIT-licensed core — Cua Driver, Cua Sandbox SDK, Cua Bench, and Lume. Self-hosted on your own infrastructure.

    Cloud

    Request

    Hosted sandbox fleets on cua.ai with warm pools, multi-OS support (Linux, Windows, macOS, Android), and snapshot-based parallelism. BYOC and on-prem available.

    Verified Data

    Request

    Curated, human-reviewed trajectory datasets generated from verified rollouts across all four OS families for training computer-use models.

    Introduction

    Cua is the first comprehensive open-source infrastructure layer purpose-built for computer-use agents — AI systems that see a screen, reason about it, and interact with desktop applications the way a human would. Developed by Cua AI, Inc. ( Y Combinator X25 ) and released under the MIT license , Cua has grown to over 18,000 GitHub stars, 3,500+ commits, and 521 releases since its founding in March 2025.

    The platform fills a critical gap in the AI agent stack. While vision-language models have become increasingly capable of understanding screenshots and generating actions, the infrastructure to actually run those actions on real desktops — in the background, across operating systems, at scale — was fragmented or proprietary. Cua provides the missing layer: background desktop drivers, ephemeral sandboxes, standardized benchmarks, and macOS virtualization, all accessible through a unified Python and TypeScript SDK, an MCP server, and a CLI.

    Cua is used by over 50,000 engineers at companies including Google, Meta, Apple, and NVIDIA for training, evaluation, and deployment of computer-use agents.

    Key Components

    Cua is not a single tool but a platform of interoperable components. You enter through one of four products:

    Cua Driver — Background Desktop Control

    Cua Driver is the entry point for giving an existing coding agent the ability to control a desktop computer. It runs as a background daemon on macOS (stable) and Windows (stable), with Linux in pre-release.

    Agents click, type, scroll, inspect accessibility trees, and capture window state — all without stealing the cursor or focus from the user.

    The driver exposes a single MCP stdio server and a CLI that call the same driver handlers. This means any MCP-capable client — Claude Code, Codex, Cursor, Hermes, OpenClaw, OpenCode — can wire it up in minutes:

    # macOS / Linux
    /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/cua-driver/scripts/install.sh)"
    
    # Then connect to Claude Code
    claude mcp add --transport stdio cua-driver -- cua-driver mcp

    Cua Driver is built on native platform APIs — Accessibility (AX) on macOS, UI Automation on Windows — and uses SkyLight internals on macOS for multi-cursor support. Each agent session gets its own synthetic cursor, and the driver orchestrates per-window element indexes that refresh on every snapshot, so pre-action and post-action window state are part of the contract.

    Key capabilities:

    • launch_app — Start an application by bundle ID or path
    • click / double_click / right_click — Click by element index or coordinates
    • type_text — Send keystrokes to a focused element
    • scroll — Scroll in any direction
    • get_window_state — Returns accessibility tree + screenshot of the current window
    • drag / held_button — Mouse drag and held-button interactions (Linux pre-release)

    Cua Sandbox — Agent-Ready Environments

    Cua Sandbox provides ephemeral desktop environments for agents to run in. One Python or TypeScript API boots Linux, Windows, macOS, or Android environments — locally via QEMU or in the cloud via Cua Cloud:

    from cua import Sandbox, Image
    
    # Same API regardless of OS or runtime
    async with Sandbox.ephemeral(Image.linux()) as sb:
        result = await sb.shell.run("echo hello")
        screenshot = await sb.screenshot()
        await sb.mouse.click(100, 200)
        await sb.keyboard.type("Hello from Cua!")
        await sb.mobile.gesture((100, 500), (100, 200))

    Sandboxes support snapshot-native rollouts — fork known machine states over copy-on-write storage, run parallel episodes, and reproduce failures without rebuilding from scratch. Cua Run extends this to elastic infrastructure: claim pre-booted machines from formally verified warm pools in milliseconds, scaling to zero when idle.

    Runtime

    Linux

    Windows

    macOS

    Android

    BYOI

    Cloud (cua.ai)

    🔜

    Local (QEMU)

    Cua Bench — Benchmarks & RL Environments

    Cua Bench is a standardized evaluation framework for computer-use agents. It supports running agents against established benchmarks — OSWorld, ScreenSpot, Windows Arena — and custom task definitions:

    # Clone, install, and create base image
    git clone https://github.com/trycua/cua && cd cua/cua-bench
    uv tool install -e . && cb image create linux-docker
    
    # Run benchmark with agent
    cb run dataset datasets/cua-bench-basic --agent cua-agent --max-parallel 4

    Cua Bench also provides RL environments for training computer-use models, with trajectory export for downstream training pipelines. The cuabench.ai registry hosts community-contributed tasks and leaderboards.

    Lume — macOS Virtualization

    Lume is a CLI tool for creating and managing macOS and Linux VMs on Apple Silicon with near-native performance, built on Apple's Virtualization.Framework . It predates Apple's Containerization announcement and was Cua's original open-source project:

    # Install Lume
    /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
    
    # Pull & start a macOS VM
    lume run macos-sequoia-vanilla:latest

    Lume supports restoring from IPSW files, Resizeable Raw Disk Images, and a Docker-compatible interface called Lumier for orchestrating multiple VMs.

    Verified Data

    Cua offers a Verified Data pipeline: human-reviewed trajectory datasets generated from verified agent rollouts across all four OS families. Every trajectory is scored against an evaluator, accepted runs pass human QA, and golden trajectories carry step-level annotations for training computer-use models.

    Architecture

    Cua's architecture is modular by design — each component can be used independently or composed together:

    ┌─────────────────────────────────────────────────────────────┐
    │                      Coding Agents                          │
    │   Claude Code · Codex · Cursor · Hermes · OpenClaw · ...   │
    └────────────────────────┬────────────────────────────────────┘
                             │ MCP / HTTP
    ┌────────────────────────▼────────────────────────────────────┐
    │                     Cua Driver                              │
    │      Background desktop control (macOS / Windows / Linux)    │
    │         Accessibility · SkyLight · UI Automation            │
    └────────────────────────┬────────────────────────────────────┘
                             │
    ┌────────────────────────▼────────────────────────────────────┐
    │                 Cua Sandbox + Cua Run                        │
    │   Ephemeral VMs/containers — Linux · Windows · macOS · Android │
    │   Local (QEMU) · Cloud (cua.ai) · BYOC · On-prem             │
    │   Warm pools · Snapshots · Copy-on-write                     │
    └────────────────────────┬────────────────────────────────────┘
                             │
    ┌────────────────────────▼────────────────────────────────────┐
    │     Cua Bench          │          Lume                       │
    │   Benchmarks + RL      │   macOS VMs on Apple Silicon        │
    │   OSWorld · ScreenSpot │   Virtualization.Framework          │
    │   Windows Arena        │   Lumier (Docker-compatible)        │
    └────────────────────────┴────────────────────────────────────┘

    The stack is unified by the Python SDK (pip install cua) and the TypeScript SDK (@trycua/cua), with MCP as the universal protocol layer connecting coding agents to the driver.

    Integration with Coding Agents

    Cua Driver's MCP server turns any MCP-capable coding agent into a computer-use agent with a single command.

    Claude Code

    # Standard MCP — agent sees accessibility tree + tools
    claude mcp add --transport stdio cua-driver -- cua-driver mcp
    
    # Screenshot-grounded mode — Claude Code computer-use compatible
    claude mcp add --transport stdio cua-computer-use -- cua-driver mcp --claude-code-computer-use-compat

    Codex

    codex mcp add cua-driver -- cua-driver mcp

    Cursor / OpenCode / Custom MCP Clients

    # Generate client-specific config
    cua-driver mcp-config --client cursor
    cua-driver mcp-config --client opencode
    
    # Generic MCP server shape
    cua-driver mcp-config

    Hermes

    cua-driver mcp-config --client hermes
    # paste into ~/.hermes/config.yaml

    Any coding agent workflow extends to desktop automation — filling forms in native apps, exporting data from enterprise software, testing GUI applications, and automating multi-step workflows — all from the same chat interface the developer already uses.

    Use Cases

    Training Computer-Use Models

    Cua provides the full pipeline: Cua Sandbox for environment orchestration at scale, Cua Bench for standardized evaluation, and Verified Data for human-reviewed training trajectories. Research teams at Google, Meta, and NVIDIA use Cua to generate training data for their computer-use vision-language models.

    Automated GUI Testing

    QA teams can deploy Cua Driver in CI/CD pipelines to run GUI tests against native desktop applications. The driver's background execution means tests run without interfering with other work on the same machine, and snapshots provide deterministic replay of failures.

    Enterprise Desktop Automation

    Cua Driver automates repetitive desktop workflows — data entry across legacy applications, software installation and configuration, file management across network drives — with the reliability of accessibility-tree-grounded interaction rather than fragile pixel coordinates.

    Multi-Agent Parallel Workflows

    Cua Run's warm pools and session-identity system enable multiple agents to work in parallel on the same or different machines. Each agent gets its own cursor and window session, enabling coordinated multi-agent desktop automation.

    macOS CI/CD for Apple Ecosystem

    Lume provides near-native macOS VMs for CI/CD pipelines — build and test macOS, iOS, and visionOS applications without dedicated Mac hardware. Combined with Cua's sandbox API, teams can spin up macOS environments on demand for testing.

    Pricing

    Tier

    Price

    Description

    Open Source

    $0

    MIT-licensed core — Cua Driver, Cua Sandbox SDK, Cua Bench, and Lume. Self-hosted on your infrastructure.

    Cloud

    By request

    Hosted sandbox fleets with warm pools, multi-OS support (Linux, Windows, macOS, Android), and snapshot-based parallelism. BYOC and on-prem available.

    Verified Data

    By request

    Curated, human-reviewed trajectory datasets from verified rollouts across all four OS families.

    The entire core platform is MIT-licensed and available on GitHub . Cloud infrastructure and verified data are available by contacting the Cua team, with options for SOC 2-ready deployment, BYOC, and on-premises hosting.

    Getting Started

    Try Cua Driver (quickest start)

    # Install on macOS or Windows
    # macOS / Linux
    /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/cua-driver/scripts/install.sh)"
    
    # Windows (PowerShell as admin)
    irm https://raw.githubusercontent.com/trycua/cua/main/libs/cua-driver/scripts/install.ps1 | iex
    
    # Start the daemon
    open -n -g -a CuaDriver --args serve
    cua-driver status
    
    # Wire into Claude Code
    claude mcp add --transport stdio cua-driver -- cua-driver mcp
    
    # Launch an app and interact
    cua-driver call launch_app '{"bundle_id":"com.apple.calculator"}'
    cua-driver call get_window_state '{"pid":844,"window_id":10725}'
    cua-driver call click '{"pid":844,"window_id":10725,"element_index":14}'

    Try Cua Sandbox (Python)

    pip install cua
    from cua import Sandbox, Image
    
    async with Sandbox.ephemeral(Image.linux()) as sb:
        print(await sb.shell.run("uname -a"))
        await sb.mouse.click(500, 400)
        await sb.keyboard.type("Hello from Cua!")
        screenshot = await sb.screenshot()
        # screenshot is a PIL Image — save, display, or send to a VLM

    Try Cua Bench

    git clone https://github.com/trycua/cua && cd cua/cua-bench
    uv tool install -e .
    cb image create linux-docker
    cb run dataset datasets/cua-bench-basic --agent cua-agent

    Try Lume

    /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
    lume run macos-sequoia-vanilla:latest

    Further Reading

    Version History

    cua-computer-server v0.3.41

    Latest — maintenance release with dependency updates

    cua-computer-server v0.3.40

    Fix MCP routing at both /mcp and /mcp/ paths, route single printable keypress through type_text

    cua-driver-rs v0.5.3

    Rust driver release — Linux background drag and held-button tools, show cursor and type in background terminals

    lume v0.3.10

    macOS VM management — bump with stability improvements

    cua-driver-rs v0.5.0

    Caller-declared session identity + Streamable-HTTP transport for multi-agent parallelism

    Signature Snippet
    Install Cua Driver on macOS and connect it to Claude Code via MCP: 'claude mcp add --transport stdio cua-driver -- cua-driver mcp'. Then inside Claude Code, type: 'Open the Calculator app, compute 15% of 340, and paste the result into a new TextEdit document.' Cua Driver drives the desktop apps in the background — clicking, typing, and verifying via accessibility trees and screenshots — without stealing cursor focus or interrupting your workflow.

    Live feed in your inbox

    Track the tools. Lead the shift.

    Tech leaders use Artificialus to stay ahead: editorial picks, agent comparisons, MCP updates, and signal-heavy analysis when it matters.

    No spam. Only tools and shifts worth tracking.