Hobby
5M Cosine Credits per seat/month for solo developers and side projects. Credit top-ups available at $20 per additional 5M credits.
AI coding agent powered by the proprietary Lumen model family. Top benchmark performer on Niche-Bench, Vibe-Bench, and Slop-Bench, and a cornerstone of the UK sovereign AI strategy.
5M Cosine Credits per seat/month for solo developers and side projects. Credit top-ups available at $20 per additional 5M credits.
60M Cosine Credits per seat/month for growing teams shipping at scale. Credit top-ups available at $200 per additional 60M credits.
Cloud, dedicated single-tenant VPC, or fully air-gapped deployment with custom model weights, zero data egress, and dedicated support.
Cosine is an AI coding agent built on a radical premise: that a model trained exclusively on real production code will outperform generalist models at the one task developers actually need — writing exceptional software. Powered by Cosine’s proprietary Lumen model family, Cosine delivers autonomous engineering across CLI and cloud environments, with benchmark results that position it alongside (and in several categories ahead of) GPT-5.5, Gemini 3.1 Pro, and Kimi K2.6.
Cosine originally launched in June 2025 and introduced its proprietary Lumen model family in early 2026, starting with Scout and followed by the flagship Outpost model in May 2026.
Beyond raw benchmarks, Cosine has emerged as a strategic asset in the UK’s sovereign AI ambitions. In April 2026, the UK government named Cosine a cornerstone of its $675 million Sovereign AI Fund , providing national GPU infrastructure support for the upcoming Lumen Sovereign model. This makes Cosine one of the best-capitalised AI coding agents in the market, with institutional backing that extends far beyond typical VC funding.
Cosine’s Lumen family consists of three models, each optimised for a different layer of the development workflow:
Post-trained from Devstral 123B, Scout is the infrastructure layer. It runs cheaply, operates on-device, and handles the heavy lifting around the coding loop: mapping millions of lines of code, indexing repositories, and powering retrieval systems. With a minimum of 4 H100 GPUs, it is designed for large-scale support workflows where fast, low-cost intelligence matters more than deep reasoning.
Post-trained from Kimi K2.6, Outpost is the core of the Lumen family and currently Cosine’s flagship model. It combines strong coding performance — 53.9% on Niche-Bench , 25.4% on Slop-Bench, 29.4% on Vibe-Bench — with economics that make everyday AI-assisted development viable for entire engineering organizations. Outpost requires 8 H100 GPUs and delivers the best cost-per-successful-task ratio in its class at $7.90.
Lumen Outpost achieves 53.9% on Niche-Bench, 25.4% on Slop-Bench, and 29.4% on Vibe-Bench at $7.90 per successful task — the most cost-efficient frontier coding agent.
Coming soon, Sovereign sets the quality bar for maximum reasoning. Designed for complex architectural decisions, deep reasoning tasks, and large-scale system transformations, Sovereign is being built with GPU support from the UK government’s Sovereign AI Fund. It represents Cosine’s bet that the post-training techniques proven on Outpost can scale to frontier-level reasoning without sacrificing the production-code discipline that defines the Lumen family.
Cosine publishes results across three internal benchmarks designed to measure what matters in production engineering:
Niche-Bench evaluates performance across 13 programming languages including Fortran, ABAP, Java, Rust, C, MATLAB, Verilog, and COBOL — a deliberate departure from Python-centric benchmarks that dominate the industry. Lumen Outpost leads all tested models with a 53.9% Pass@3 score, outperforming GPT-5.5 (47.4%), Kimi K2.6 (48.3%), and Gemini 3.1 Pro (44.9%). The model shows particular strength in functional and high-context environments, with gains of +18.1pp in Rust and +12.9pp in Java over its base model.
Vibe-Bench measures behavioral qualities: conciseness, honesty, scope discipline, and appropriate planning. Lumen Outpost scores 29.4%, trailing only GPT-5.5 (31.9%) but ahead of GPT-5.4 (27.9%), Kimi K2.6 (22.7%), and Gemini 3.1 Pro (20.3%). It achieves 96.3% action alignment and 96.0% scope discipline — parity with GPT-5.5 on staying on task — while being significantly more concise in its updates (66.9%).
Cosine’s most distinctive benchmark, Slop-Bench measures the reduction of low-quality code changes (slop) introduced by AI models. Lumen Outpost leads with 25.4%, just ahead of GPT-5.5 (25.0%) and well clear of its base model Kimi K2.6 (19.8%). This validates Cosine’s thesis that RL training on production code — rather than synthetic textbook examples — produces models that write cleaner, more maintainable code that respects existing codebase architecture.
Cosine’s most compelling commercial metric: $7.90 per successful task, making it the cheapest model among frontier coding agents by a wide margin. GPT-5.5 costs $28.41 per successful task (3.6x more), Gemini 3.1 Pro costs $17.79, and even Kimi K2.6 costs $8.20. This cost efficiency comes from lower output-token pricing combined with a higher effective success rate, partially offsetting a smaller context window that requires more turns per complex task.
Cosine installs via a single Homebrew command — brew install CosineAI/tap/cos — and runs natively on macOS, Linux, and Windows. The CLI provides terminal-native workflow with local-to-remote execution, MCP tool access, project context awareness, and multi-agent orchestration. Engineers who live in the terminal can stay there, using Cosine as an autonomous collaborator that reads, plans, writes, and tests code without leaving the command line.
Cosine also offers a native Desktop application that provides a dedicated workspace outside the browser. It connects to the same cloud environment, allowing persistent sessions, system-level integrations, and a focused interface for long-running agent tasks without tying up a terminal.
The Cosine Cloud platform provides a collaborative workspace where engineers, product managers, and stakeholders share a single environment to plan, review, and ship work. Tasks run in parallel, long-running operations continue asynchronously, and the cloud interface gives non-terminal users visibility into agent activity. Work seamlessly handoffs between CLI and Cloud — a session started on the command line can be reviewed or continued in the browser.
Cosine offers three enterprise deployment tiers:
Cosine is not trained on customer data. Its capabilities are built exclusively from publicly available open-source repositories. The company’s proprietary data pipeline transforms real production code into verifiable training trajectories that teach the model to write clean, disciplined code — not just code that passes tests, but code that remains maintainable six months later.
| Plan | Price | Credits | Best For |
|---|---|---|---|
| Hobby | $20/seat/month | 5M Cosine Credits/seat/month | Solo developers and side projects |
| Professional | $200/seat/month | 60M Cosine Credits/seat/month | Growing teams shipping at scale |
| Enterprise | Custom | Custom | Regulated industries requiring VPC or air-gapped deployment |
Additional seats add proportional credits to the team pool. Credit top-ups are available at the same per-block rate. Unused monthly credits do not roll over. When the credit balance is exhausted, agent inference pauses until top-up or the next billing cycle.
Cosine targets production engineering teams who maintain complex, long-lived codebases — exactly the environments where general-purpose AI models tend to produce slop. Its emphasis on niche languages (Fortran, COBOL, Verilog, ABAP, MATLAB, R) makes it particularly valuable for:
Cosine is one of the most distinctive AI coding agents to emerge in 2025-2026. Its thesis — that models trained exclusively on production code outperform generalists at engineering work — is validated by strong benchmark results and a rapidly growing enterprise customer base. The Lumen model family’s performance on Niche-Bench and Slop-Bench demonstrates that Cosine isn’t just competing on standard metrics; it’s redefining what quality means for AI-generated code.
The UK sovereign AI backing provides a moat that few competitors can match: guaranteed GPU access, national infrastructure support, and a strategic mandate that extends beyond quarterly VC cycles. For engineering teams managing complex codebases, especially those involving niche languages or enterprise compliance requirements, Cosine offers a compelling combination of benchmark leadership, enterprise maturity, and cost efficiency.
The caveat is maturity — Cosine is new, closed-source, and its credit-based pricing requires careful monitoring. But for teams that value code quality over speed and need an agent that treats maintainability as a first-class output, Cosine deserves serious evaluation.
Post-trained from Kimi K2.6 — core model with 53.9% on Niche-Bench, 25.4% on Slop-Bench, $7.90 per successful task
Post-trained from Devstral 123B — on-device utility model for codebase mapping, indexing, and retrieval
Coming soon — maximum reasoning model for complex software transformations
A developer working on a large Java monolith asks Cosine to migrate a legacy module to a modern architecture. Cosine maps the codebase, plans the refactor, writes the migration across 30+ files, and produces clean diffs that match the team's existing coding style — all without hallucinating libraries or leaving dead code behind. AI code review platform for the AI era. Automated code reviews, security scanning, and team analytics across GitHub, GitLab, VS Code, and JetBrains. Used by 300,000+ developers.
AI-powered PR description generator and code review assistant. Automatically writes pull request descriptions, sends stakeholder notifications, creates changelogs, and provides inline code refactoring.
Multi-agent AI coding platform with 12+ agents and 24+ models, featuring Chairman LLM for parallel multi-agent evaluation and end-to-end encrypted inference. Ships across six surfaces: CLI, IDE, Cloud, API, Mobile, and Builder.