# Cosine | Artificialus

> For the complete content index, see [llms.txt](https://artificialus.com/llms.txt). Markdown versions of all pages are available by appending `.md` to any URL.

- Home
- /
- Agents
- /
- Cosine

CO

# Cosine

The Production-First AI Coding Agent

Cosine AI

Closed source

Since 2025

Visit Website

Docs

Share

X

Facebook

Reddit

Telegram

Bluesky

Email

AI coding agent powered by the proprietary Lumen model family. Top benchmark performer on Niche-Bench, Vibe-Bench, and Slop-Bench, and a cornerstone of the UK sovereign AI strategy.

##
+

Pros
- Top benchmark scores — leads Niche-Bench (53.9%) and Slop-Bench (25.4%), competitive on Vibe-Bench (29.4%)
- Built-in enterprise deployment options spanning public cloud, managed single-tenant VPC, and fully air-gapped environments
- Lumen models trained exclusively on production code via RL, eliminating code slop and prioritizing maintainability over quick fixes
- Most cost-efficient among frontier coding agents at $7.90 per successful task — 3.6x cheaper than GPT-5.5
- UK sovereign AI backing through a $675M government fund ensures long-term strategic investment and national infrastructure access

##
−

Cons
- Closed-source with no public repository, community edition, or self-hosted option for individual developers
- Credit-based pricing (Cosine Credits) makes monthly cost forecasting non-trivial for variable usage patterns
- Sovereign tier (maximum reasoning model) is still in development with no confirmed release date
- Relatively new entrant compared to established competitors like Claude Code, GPT-5.5, and Gemini 3.1 Pro
- Smaller context window than some competitors, leading to higher average token consumption per task

##

Pricing

### Hobby

$20/seat/mo

5M Cosine Credits per seat/month for solo developers and side projects. Credit top-ups available at $20 per additional 5M credits.

### Professional

$200/seat/mo

60M Cosine Credits per seat/month for growing teams shipping at scale. Credit top-ups available at $200 per additional 60M credits.

### Enterprise

Custom

Cloud, dedicated single-tenant VPC, or fully air-gapped deployment with custom model weights, zero data egress, and dedicated support.

## Introduction

Cosine is an AI coding agent built on a radical premise: that a model trained exclusively on real production code will outperform generalist models at the one task developers actually need — writing exceptional software. Powered by Cosine’s proprietary Lumen model family, Cosine delivers autonomous engineering across CLI and cloud environments, with benchmark results that position it alongside (and in several categories ahead of) GPT-5.5, Gemini 3.1 Pro, and Kimi K2.6.

Cosine originally launched in June 2025 and introduced its proprietary Lumen model family in early 2026, starting with Scout and followed by the flagship Outpost model in May 2026.

Beyond raw benchmarks, Cosine has emerged as a strategic asset in the UK’s sovereign AI ambitions. In April 2026, the UK government named Cosine a cornerstone of its $675 million Sovereign AI Fund , providing national GPU infrastructure support for the upcoming Lumen Sovereign model. This makes Cosine one of the best-capitalised AI coding agents in the market, with institutional backing that extends far beyond typical VC funding.

## The Lumen Model Family

Cosine’s Lumen family consists of three models, each optimised for a different layer of the development workflow:

### Lumen Scout

Post-trained from Devstral 123B, Scout is the infrastructure layer. It runs cheaply, operates on-device, and handles the heavy lifting around the coding loop: mapping millions of lines of code, indexing repositories, and powering retrieval systems. With a minimum of 4 H100 GPUs, it is designed for large-scale support workflows where fast, low-cost intelligence matters more than deep reasoning.

### Lumen Outpost

Post-trained from Kimi K2.6, Outpost is the core of the Lumen family and currently Cosine’s flagship model. It combines strong coding performance — 53.9% on Niche-Bench , 25.4% on Slop-Bench, 29.4% on Vibe-Bench — with economics that make everyday AI-assisted development viable for entire engineering organizations. Outpost requires 8 H100 GPUs and delivers the best cost-per-successful-task ratio in its class at $7.90.

>
Lumen Outpost achieves 53.9% on Niche-Bench, 25.4% on Slop-Bench, and 29.4% on Vibe-Bench at $7.90 per successful task — the most cost-efficient frontier coding agent.

###
Lumen Sovereign

Coming soon, Sovereign sets the quality bar for maximum reasoning. Designed for complex architectural decisions, deep reasoning tasks, and large-scale system transformations, Sovereign is being built with GPU support from the UK government’s Sovereign AI Fund. It represents Cosine’s bet that the post-training techniques proven on Outpost can scale to frontier-level reasoning without sacrificing the production-code discipline that defines the Lumen family.

##
Benchmark Performance

Cosine publishes results across three internal benchmarks designed to measure what matters in production engineering:

###
Niche-Bench (53.9% — First Place)

Niche-Bench evaluates performance across 13 programming languages including Fortran, ABAP, Java, Rust, C, MATLAB, Verilog, and COBOL — a deliberate departure from Python-centric benchmarks that dominate the industry. Lumen Outpost leads all tested models with a 53.9% Pass@3 score, outperforming GPT-5.5 (47.4%), Kimi K2.6 (48.3%), and Gemini 3.1 Pro (44.9%). The model shows particular strength in functional and high-context environments, with gains of +18.1pp in Rust and +12.9pp in Java over its base model.

###
Vibe-Bench (29.4% — Second Place)

Vibe-Bench measures behavioral qualities: conciseness, honesty, scope discipline, and appropriate planning. Lumen Outpost scores 29.4%, trailing only GPT-5.5 (31.9%) but ahead of GPT-5.4 (27.9%), Kimi K2.6 (22.7%), and Gemini 3.1 Pro (20.3%). It achieves 96.3% action alignment and 96.0% scope discipline — parity with GPT-5.5 on staying on task — while being significantly more concise in its updates (66.9%).

###
Slop-Bench (25.4% — First Place)

Cosine’s most distinctive benchmark, Slop-Bench measures the reduction of low-quality code changes (slop) introduced by AI models. Lumen Outpost leads with 25.4%, just ahead of GPT-5.5 (25.0%) and well clear of its base model Kimi K2.6 (19.8%). This validates Cosine’s thesis that RL training on production code — rather than synthetic textbook examples — produces models that write cleaner, more maintainable code that respects existing codebase architecture.

###
Cost Efficiency: $7.90 per Successful Task

Cosine’s most compelling commercial metric: $7.90 per successful task, making it the cheapest model among frontier coding agents by a wide margin. GPT-5.5 costs $28.41 per successful task (3.6x more), Gemini 3.1 Pro costs $17.79, and even Kimi K2.6 costs $8.20. This cost efficiency comes from lower output-token pricing combined with a higher effective success rate, partially offsetting a smaller context window that requires more turns per complex task.

##
Key Features

###
CLI-First Architecture

Cosine installs via a single Homebrew
command — `brew install CosineAI/tap/cos` — and runs natively on macOS, Linux, and Windows. The CLI provides terminal-native workflow with local-to-remote execution, MCP tool access, project context awareness, and multi-agent orchestration. Engineers who live in the terminal can stay there, using Cosine as an autonomous collaborator that reads, plans, writes, and tests code without leaving the command line.

###
Desktop App

Cosine also offers a native Desktop application that provides a dedicated workspace outside the browser. It connects to the same cloud environment, allowing persistent sessions, system-level integrations, and a focused interface for long-running agent tasks without tying up a terminal.

###
Cloud Workspace

The Cosine Cloud platform provides a collaborative workspace where engineers, product managers, and stakeholders share a single environment to plan, review, and ship work. Tasks run in parallel, long-running operations continue asynchronously, and the cloud interface gives non-terminal users visibility into agent activity. Work seamlessly handoffs between CLI and Cloud — a session started on the command line can be reviewed or continued in the browser.

###
Enterprise Deployment

Cosine offers three enterprise deployment tiers:
- Public Cloud — Fully managed multi-tenant SaaS with instant access to the latest models and features. Ideal for rapid evaluation and teams whose requirements fit standard cloud controls.
- Managed Single-Tenant VPC — A private Cosine environment operated for the organisation with dedicated isolation, network connectivity, identity integration, and policy controls. The platform is still managed by Cosine, but data boundaries are private.
- Fully Air-Gapped — Cosine runs entirely inside the organisation’s own security perimeter with zero external dependencies and no data egress. Option to post-train or fine-tune models for internal codebases, frameworks, and legacy languages. Designed for strict isolation, regulatory, and classified-network requirements.

###
Training Philosophy

Cosine is not trained on customer data. Its capabilities are built exclusively from publicly available open-source repositories. The company’s proprietary data pipeline transforms real production code into verifiable training trajectories that teach the model to write clean, disciplined code — not just code that passes tests, but code that remains maintainable six months later.

##
Pricing and Plans

Plan

Price

Credits

Best For

Hobby

$20/seat/month

5M Cosine Credits/seat/month

Solo developers and side projects

Professional

$200/seat/month

60M Cosine Credits/seat/month

Growing teams shipping at scale

Enterprise

Custom

Custom

Regulated industries requiring VPC or air-gapped deployment

Additional seats add proportional credits to the team pool. Credit top-ups are available at the same per-block rate. Unused monthly credits do not roll over. When the credit balance is exhausted, agent inference pauses until top-up or the next billing cycle.

##
Who Is It For?

Cosine targets production engineering teams who maintain complex, long-lived codebases — exactly the environments where general-purpose AI models tend to produce slop. Its emphasis on niche languages (Fortran, COBOL, Verilog, ABAP, MATLAB, R) makes it particularly valuable for:
- Enterprise IT organisations maintaining legacy systems across multiple generations of technology
- Fintech, defense, and regulated industries that need air-gapped AI without sacrificing capability
- Engineering teams shipping in Rust, Java, and C++ who want an agent that understands architectural discipline
- UK government and sovereign AI initiatives building national AI infrastructure
- Platform engineering teams that need multi-agent orchestration and collaborative review workflows

##
Pros and Cons

###
Pros
- Benchmark leadership — Lumen Outpost leads Niche-Bench and Slop-Bench, demonstrating real-world coding quality, not just Python performance
- Enterprise-ready from day one — Cloud, VPC, and air-gapped deployment with zero data egress, designed for regulated environments
- Production-code training — RL on real open-source repositories, not synthetic data, producing cleaner, more maintainable code
- Cost efficient — $7.90 per successful task is the lowest among frontier models, making team-scale deployment economically viable
- Strategic backing — UK sovereign AI fund provides long-term capital and national GPU infrastructure access

###
Cons
- Closed source — No public repository, community edition, or self-hosted option; full dependency on Cosine’s infrastructure
- Credit system complexity — Cosine Credits and per-seat pools add overhead to cost forecasting compared to simpler per-seat pricing
- Sovereign tier not yet available — The maximum-reasoning model that justifies the UK government investment is still in development
- New entrant risk — Less proven in production than Claude Code, GPT-5.5 agents, or Gemini-based tools
- Smaller context window — Higher average token consumption per task partially offsets the pricing advantage

##
Conclusion

Cosine is one of the most distinctive AI coding agents to emerge in 2025-2026. Its thesis — that models trained exclusively on production code outperform generalists at engineering work — is validated by strong benchmark results and a rapidly growing enterprise customer base. The Lumen model family’s performance on Niche-Bench and Slop-Bench demonstrates that Cosine isn’t just competing on standard metrics; it’s redefining what quality means for AI-generated code.

The UK sovereign AI backing provides a moat that few competitors can match: guaranteed GPU access, national infrastructure support, and a strategic mandate that extends beyond quarterly VC cycles. For engineering teams managing complex codebases, especially those involving niche languages or enterprise compliance requirements, Cosine offers a compelling combination of benchmark leadership, enterprise maturity, and cost efficiency.

The caveat is maturity — Cosine is new, closed-source, and its credit-based pricing requires careful monitoring. But for teams that value code quality over speed and need an agent that treats maintainability as a first-class output, Cosine deserves serious evaluation.

###
Further Reading
- Official Website
- Documentation
- Lumen Outpost Benchmark Report
- UK Sovereign AI Fund Announcement
- Niche-Bench Details

## Version History

Lumen Outpost
May 13, 2026
Post-trained from Kimi K2.6 — core model with 53.9% on Niche-Bench, 25.4% on Slop-Bench, $7.90 per successful task

Lumen Scout
Apr 1, 2026
Post-trained from Devstral 123B — on-device utility model for codebase mapping, indexing, and retrieval

Lumen Sovereign
Dec 31, 2026
Coming soon — maximum reasoning model for complex software transformations

Best for Production engineering, enterprise codebases, and teams maintaining legacy systems across niche languages

Capability Lumen model family with RL trained exclusively on production code, Top benchmarks on Niche-Bench and Slop-Bench, Cost-efficient at $7.90 per successful task, UK sovereign AI backing with $675M fund, Enterprise deployment from cloud to fully air-gapped

Runs on CLI · Cloud · Desktop · VS Code

Signature Snippet

Copy

```
`A developer working on a large Java monolith asks Cosine to migrate a legacy module to a modern architecture. Cosine maps the codebase, plans the refactor, writes the migration across 30+ files, and produces clean diffs that match the team's existing coding style — all without hallucinating libraries or leaving dead code behind.`
```

## More in this Space

SO

### Sourcery

Closed source

AI code review platform for the AI era. Automated code reviews, security scanning, and team analytics across GitHub, GitLab, VS Code, and JetBrains. Used by 300,000+ developers.

View profile

WT

### What The Diff

Closed source

AI-powered PR description generator and code review assistant. Automatically writes pull request descriptions, sends stakeholder notifications, creates changelogs, and provides inline code refactoring.

View profile

BA

### Blackbox AI

Closed source

Multi-agent AI coding platform with 12+ agents and 24+ models, featuring Chairman LLM for parallel multi-agent evaluation and end-to-end encrypted inference. Ships across six surfaces: CLI, IDE, Cloud, API, Mobile, and Builder.

View profile