Cosine

The Production-First AI Coding Agent

Cosine AI Closed source Since 2025

AI coding agent powered by the proprietary Lumen model family. Top benchmark performer on Niche-Bench, Vibe-Bench, and Slop-Bench, and a cornerstone of the UK sovereign AI strategy.

+ Pros

Top benchmark scores — leads Niche-Bench (53.9%) and Slop-Bench (25.4%), competitive on Vibe-Bench (29.4%)
Built-in enterprise deployment options spanning public cloud, managed single-tenant VPC, and fully air-gapped environments
Lumen models trained exclusively on production code via RL, eliminating code slop and prioritizing maintainability over quick fixes
Most cost-efficient among frontier coding agents at $7.90 per successful task — 3.6x cheaper than GPT-5.5
UK sovereign AI backing through a $675M government fund ensures long-term strategic investment and national infrastructure access

− Cons

Closed-source with no public repository, community edition, or self-hosted option for individual developers
Credit-based pricing (Cosine Credits) makes monthly cost forecasting non-trivial for variable usage patterns
Sovereign tier (maximum reasoning model) is still in development with no confirmed release date
Relatively new entrant compared to established competitors like Claude Code, GPT-5.5, and Gemini 3.1 Pro
Smaller context window than some competitors, leading to higher average token consumption per task

Pricing

Hobby

$20/seat/mo

5M Cosine Credits per seat/month for solo developers and side projects. Credit top-ups available at $20 per additional 5M credits.

Professional

$200/seat/mo

60M Cosine Credits per seat/month for growing teams shipping at scale. Credit top-ups available at $200 per additional 60M credits.

Enterprise

Custom

Cloud, dedicated single-tenant VPC, or fully air-gapped deployment with custom model weights, zero data egress, and dedicated support.

Introduction

Cosine is an AI coding agent built on a radical premise: that a model trained exclusively on real production code will outperform generalist models at the one task developers actually need — writing exceptional software. Powered by Cosine’s proprietary Lumen model family, Cosine delivers autonomous engineering across CLI and cloud environments, with benchmark results that position it alongside (and in several categories ahead of) GPT-5.5, Gemini 3.1 Pro, and Kimi K2.6.

Cosine originally launched in June 2025 and introduced its proprietary Lumen model family in early 2026, starting with Scout and followed by the flagship Outpost model in May 2026.

Beyond raw benchmarks, Cosine has emerged as a strategic asset in the UK’s sovereign AI ambitions. In April 2026, the UK government named Cosine a cornerstone of its $675 million Sovereign AI Fund , providing national GPU infrastructure support for the upcoming Lumen Sovereign model. This makes Cosine one of the best-capitalised AI coding agents in the market, with institutional backing that extends far beyond typical VC funding.

The Lumen Model Family

Cosine’s Lumen family consists of three models, each optimised for a different layer of the development workflow:

Lumen Scout

Post-trained from Devstral 123B, Scout is the infrastructure layer. It runs cheaply, operates on-device, and handles the heavy lifting around the coding loop: mapping millions of lines of code, indexing repositories, and powering retrieval systems. With a minimum of 4 H100 GPUs, it is designed for large-scale support workflows where fast, low-cost intelligence matters more than deep reasoning.

Lumen Outpost

Post-trained from Kimi K2.6, Outpost is the core of the Lumen family and currently Cosine’s flagship model. It combines strong coding performance — 53.9% on Niche-Bench , 25.4% on Slop-Bench, 29.4% on Vibe-Bench — with economics that make everyday AI-assisted development viable for entire engineering organizations. Outpost requires 8 H100 GPUs and delivers the best cost-per-successful-task ratio in its class at $7.90.

Lumen Outpost achieves 53.9% on Niche-Bench, 25.4% on Slop-Bench, and 29.4% on Vibe-Bench at $7.90 per successful task — the most cost-efficient frontier coding agent.

Lumen Sovereign

Coming soon, Sovereign sets the quality bar for maximum reasoning. Designed for complex architectural decisions, deep reasoning tasks, and large-scale system transformations, Sovereign is being built with GPU support from the UK government’s Sovereign AI Fund. It represents Cosine’s bet that the post-training techniques proven on Outpost can scale to frontier-level reasoning without sacrificing the production-code discipline that defines the Lumen family.

Benchmark Performance

Cosine publishes results across three internal benchmarks designed to measure what matters in production engineering:

Niche-Bench (53.9% — First Place)

Niche-Bench evaluates performance across 13 programming languages including Fortran, ABAP, Java, Rust, C, MATLAB, Verilog, and COBOL — a deliberate departure from Python-centric benchmarks that dominate the industry. Lumen Outpost leads all tested models with a 53.9% Pass@3 score, outperforming GPT-5.5 (47.4%), Kimi K2.6 (48.3%), and Gemini 3.1 Pro (44.9%). The model shows particular strength in functional and high-context environments, with gains of +18.1pp in Rust and +12.9pp in Java over its base model.

Vibe-Bench (29.4% — Second Place)

Vibe-Bench measures behavioral qualities: conciseness, honesty, scope discipline, and appropriate planning. Lumen Outpost scores 29.4%, trailing only GPT-5.5 (31.9%) but ahead of GPT-5.4 (27.9%), Kimi K2.6 (22.7%), and Gemini 3.1 Pro (20.3%). It achieves 96.3% action alignment and 96.0% scope discipline — parity with GPT-5.5 on staying on task — while being significantly more concise in its updates (66.9%).

Slop-Bench (25.4% — First Place)

Cosine’s most distinctive benchmark, Slop-Bench measures the reduction of low-quality code changes (slop) introduced by AI models. Lumen Outpost leads with 25.4%, just ahead of GPT-5.5 (25.0%) and well clear of its base model Kimi K2.6 (19.8%). This validates Cosine’s thesis that RL training on production code — rather than synthetic textbook examples — produces models that write cleaner, more maintainable code that respects existing codebase architecture.

Cost Efficiency: $7.90 per Successful Task

Cosine’s most compelling commercial metric: $7.90 per successful task, making it the cheapest model among frontier coding agents by a wide margin. GPT-5.5 costs $28.41 per successful task (3.6x more), Gemini 3.1 Pro costs $17.79, and even Kimi K2.6 costs $8.20. This cost efficiency comes from lower output-token pricing combined with a higher effective success rate, partially offsetting a smaller context window that requires more turns per complex task.

Key Features

CLI-First Architecture

Cosine installs via a single Homebrew command — brew install CosineAI/tap/cos — and runs natively on macOS, Linux, and Windows. The CLI provides terminal-native workflow with local-to-remote execution, MCP tool access, project context awareness, and multi-agent orchestration. Engineers who live in the terminal can stay there, using Cosine as an autonomous collaborator that reads, plans, writes, and tests code without leaving the command line.

Desktop App

Cosine also offers a native Desktop application that provides a dedicated workspace outside the browser. It connects to the same cloud environment, allowing persistent sessions, system-level integrations, and a focused interface for long-running agent tasks without tying up a terminal.

Cloud Workspace

The Cosine Cloud platform provides a collaborative workspace where engineers, product managers, and stakeholders share a single environment to plan, review, and ship work. Tasks run in parallel, long-running operations continue asynchronously, and the cloud interface gives non-terminal users visibility into agent activity. Work seamlessly handoffs between CLI and Cloud — a session started on the command line can be reviewed or continued in the browser.

Enterprise Deployment

Cosine offers three enterprise deployment tiers:

Public Cloud — Fully managed multi-tenant SaaS with instant access to the latest models and features. Ideal for rapid evaluation and teams whose requirements fit standard cloud controls.
Managed Single-Tenant VPC — A private Cosine environment operated for the organisation with dedicated isolation, network connectivity, identity integration, and policy controls. The platform is still managed by Cosine, but data boundaries are private.
Fully Air-Gapped — Cosine runs entirely inside the organisation’s own security perimeter with zero external dependencies and no data egress. Option to post-train or fine-tune models for internal codebases, frameworks, and legacy languages. Designed for strict isolation, regulatory, and classified-network requirements.

Training Philosophy

Cosine is not trained on customer data. Its capabilities are built exclusively from publicly available open-source repositories. The company’s proprietary data pipeline transforms real production code into verifiable training trajectories that teach the model to write clean, disciplined code — not just code that passes tests, but code that remains maintainable six months later.

Pricing and Plans

Plan	Price	Credits	Best For
Hobby	$20/seat/month	5M Cosine Credits/seat/month	Solo developers and side projects
Professional	$200/seat/month	60M Cosine Credits/seat/month	Growing teams shipping at scale
Enterprise	Custom	Custom	Regulated industries requiring VPC or air-gapped deployment

Additional seats add proportional credits to the team pool. Credit top-ups are available at the same per-block rate. Unused monthly credits do not roll over. When the credit balance is exhausted, agent inference pauses until top-up or the next billing cycle.

Who Is It For?

Cosine targets production engineering teams who maintain complex, long-lived codebases — exactly the environments where general-purpose AI models tend to produce slop. Its emphasis on niche languages (Fortran, COBOL, Verilog, ABAP, MATLAB, R) makes it particularly valuable for:

Enterprise IT organisations maintaining legacy systems across multiple generations of technology
Fintech, defense, and regulated industries that need air-gapped AI without sacrificing capability
Engineering teams shipping in Rust, Java, and C++ who want an agent that understands architectural discipline
UK government and sovereign AI initiatives building national AI infrastructure
Platform engineering teams that need multi-agent orchestration and collaborative review workflows

Pros and Cons

Pros

Benchmark leadership — Lumen Outpost leads Niche-Bench and Slop-Bench, demonstrating real-world coding quality, not just Python performance
Enterprise-ready from day one — Cloud, VPC, and air-gapped deployment with zero data egress, designed for regulated environments
Production-code training — RL on real open-source repositories, not synthetic data, producing cleaner, more maintainable code
Cost efficient — $7.90 per successful task is the lowest among frontier models, making team-scale deployment economically viable
Strategic backing — UK sovereign AI fund provides long-term capital and national GPU infrastructure access

Cons

Closed source — No public repository, community edition, or self-hosted option; full dependency on Cosine’s infrastructure
Credit system complexity — Cosine Credits and per-seat pools add overhead to cost forecasting compared to simpler per-seat pricing
Sovereign tier not yet available — The maximum-reasoning model that justifies the UK government investment is still in development
New entrant risk — Less proven in production than Claude Code, GPT-5.5 agents, or Gemini-based tools
Smaller context window — Higher average token consumption per task partially offsets the pricing advantage

Conclusion

Cosine is one of the most distinctive AI coding agents to emerge in 2025-2026. Its thesis — that models trained exclusively on production code outperform generalists at engineering work — is validated by strong benchmark results and a rapidly growing enterprise customer base. The Lumen model family’s performance on Niche-Bench and Slop-Bench demonstrates that Cosine isn’t just competing on standard metrics; it’s redefining what quality means for AI-generated code.

The UK sovereign AI backing provides a moat that few competitors can match: guaranteed GPU access, national infrastructure support, and a strategic mandate that extends beyond quarterly VC cycles. For engineering teams managing complex codebases, especially those involving niche languages or enterprise compliance requirements, Cosine offers a compelling combination of benchmark leadership, enterprise maturity, and cost efficiency.

The caveat is maturity — Cosine is new, closed-source, and its credit-based pricing requires careful monitoring. But for teams that value code quality over speed and need an agent that treats maintainability as a first-class output, Cosine deserves serious evaluation.

Version History

Lumen Outpost May 13, 2026

Post-trained from Kimi K2.6 — core model with 53.9% on Niche-Bench, 25.4% on Slop-Bench, $7.90 per successful task

Lumen Scout Apr 1, 2026

Post-trained from Devstral 123B — on-device utility model for codebase mapping, indexing, and retrieval

Lumen Sovereign Dec 31, 2026

Coming soon — maximum reasoning model for complex software transformations

Signature Snippet

A developer working on a large Java monolith asks Cosine to migrate a legacy module to a modern architecture. Cosine maps the codebase, plans the refactor, writes the migration across 30+ files, and produces clean diffs that match the team's existing coding style — all without hallucinating libraries or leaving dead code behind.

More in this Space

Vix

Open source

Vix is a Go-native, open-source (AGPL-3.0) AI coding agent that slashes token costs by 40-50% using a stem agent architecture and Tree-sitter virtual filesystem. It rethinks the plan/execute loop — keeping LLM cache warm across Explore, Plan, and Execute phases — while shipping Programmable Workflows, Whiteboard Mode with voice AI, MCP server support, and a self-evolving agent that writes its own scheduled jobs and watchers.

Late — High-Leverage AI Agent Orchestration

Closed source

Orchestrate an entire AI dev team on 5GB VRAM using ephemeral subagents, exact-match diffs, and a zero-dependency Go binary. Works with any OpenAI-compatible model — local or cloud.

Paca