# Codebuff | Artificialus

> For the complete content index, see [llms.txt](https://artificialus.com/llms.txt). Markdown versions of all pages are available by appending `.md` to any URL.

- Home
- /
- Agents
- /
- Codebuff

CO

# Codebuff

A multi-agent coding assistant that coordinates specialized AI agents to understand, plan, edit, and review your codebase.

Codebuff (YC Fall 2024)

Open source

Since 2025

Visit Website

Repository

Docs

Download

Share

X

Facebook

Reddit

Telegram

Bluesky

Email

Codebuff is an open-source, multi-agent coding assistant that coordinates specialized AI sub-agents — File Picker, Planner, Editor, Reviewer, Thinker, and Basher — to understand, plan, edit, and review your codebase from the terminal. Built on a deep agent framework and backed by Y Combinator (Fall 2024), it beats single-model approaches like Claude Code on complex coding tasks, scoring 61% vs 53% across 175+ real-world evals in BuffBench.

##
+

Pros
- Innovative multi-agent architecture with specialized sub-agents (File Picker, Planner, Editor, Reviewer, Thinker, Basher) that work together for superior code understanding and modification
- Outperforms Claude Code on BuffBench — 61% vs 53% win rate across 175+ real-world coding tasks from open-source repositories
- Tree-based file discovery that indexes the entire codebase in ~2 seconds using tree-sitter, then uses Gemini Flash to identify and summarize relevant files
- Flexible pricing with a free ad-supported tier (FreeBuff), usage-based credits at 1¢ each, and subscription plans starting at $100/mo
- SDK (@codebuff/sdk) for embedding coding agent capabilities into applications, CI/CD pipelines, and custom workflows
- Four operation modes — Default (standard), Max (parallel best-of-N editing), Plan (spec-only, no writes), Lite (Kimi K2.6, fast and cheap)
- Custom agent framework with TypeScript generators, agent spawning with arbitrary nesting depth, and inherited context support
- Automatic code review after every change — catches bugs, dead code, and quality issues before you see the result
- Backed by Y Combinator (W25) with active development and growing community

##
−

Cons
- Full-feature access requires $100/mo Strong subscription — FreeBuff tier is limited in model quality and shows ads
- Multi-agent orchestration adds latency vs single-model tools on simple tasks (overhead of spawning and coordinating sub-agents)
- Smaller community and ecosystem than established alternatives like Claude Code, Cursor, or GitHub Copilot
- CLI-focused without native IDE extension — relies on terminal usage inside VS Code or Cursor terminals
- Pricing complexity with multiple tiers (subscription, credits, ad-supported free tier) can be confusing to navigate

##

Pricing

### FreeBuff (Free)

$0

Ad-supported free tier. No subscription, no credits, no configuration. Uses optimized models with built-in web research and browser capabilities.

### Strong (1x)

$100/mo

Full access to all modes (Default, Max, Plan, Lite) with standard usage limits. Multi-agent orchestration with Claude Opus 4.7, GPT-5.1, Kimi K2.6.

### Strong (2.5x)

$200/mo

Higher usage limits for teams and power users.

### Strong (7x)

$500/mo

Highest usage tier for heavy usage and teams.

### Pay-as-you-go

1¢/credit

500 free credits on signup. Credits consumed based on task complexity. 500 credits ≈ a few hours of coding.

## Introduction

Codebuff is an open-source, multi-agent coding assistant that doesn’t just throw one model at your code — it coordinates a team of specialized AI agents to understand, plan, edit, and review your codebase. Launched in June 2025 by a Y Combinator-backed team (W25) and hosted on GitHub under an Apache-2.0 license, Codebuff has quickly amassed over 6,100 stars and 6,700+ commits.

The core insight behind Codebuff is simple but powerful: different parts of a coding task benefit from different models and different agent strategies. Instead of using one LLM for everything — file discovery, planning, editing, reviewing — Codebuff spawns purpose-built agents for each role. A File Picker Agent (powered by Gemini 2.0 Flash) scans your codebase and identifies relevant files. A Planner Agent maps out the changes needed. An Editor Agent (running Claude Opus 4.7, GPT-5.1, or Kimi K2.6) makes precise edits. A Reviewer Agent catches issues before you see the result. And in Max mode, multiple editors run in parallel with different strategies, and a selector picks the best output.

This multi-agent approach doesn’t just sound impressive — it’s validated by BuffBench, Codebuff’s custom eval suite that tests configurations across 175+ real implementation tasks from open-source repos. Codebuff beats Claude Code 61% vs 53% on these evals while often completing tasks 100+ seconds faster on average. In real-world tests, a feature that took Claude Code 19 minutes and 37 seconds was completed by Codebuff in 6 minutes and 45 seconds.

## Key Features

### Multi-Agent Architecture

Codebuff’s defining feature is its orchestrator-driven multi-agent system. The main orchestrator agent — named “Buffy” and running on Claude Opus 4.7 — reads your prompt, gathers context, and spawns specialized sub-agents:

Agent

Model

Role

File Picker

Gemini 2.0 Flash

Scans codebase, finds relevant files

Code Searcher

—

Grep-style pattern matching

Researcher

Gemini 3.1 Flash Lite

Web and documentation lookup

Thinker

Claude Opus 4.7, GPT-5.4

Works through hard problems

Editor

Claude Opus 4.7, GPT-5.1, Kimi K2.6

Writes and modifies code

Reviewer

Claude Opus 4.7, Kimi K2.6

Catches bugs and style issues

Basher

Gemini 3.1 Flash Lite

Runs terminal commands, tests, typechecks

Each sub-agent has a narrow, focused toolset and purpose. The orchestrator keeps its own context clean by only incorporating the final output from spawned agents. Agents can spawn sub-agents with arbitrary nesting depth — unlike Claude Code, which only supports one level of sub-agents.

### Tree-Based File Discovery

Traditional coding agents like Claude Code spend minutes grep-ing and reading file excerpts one at a time. Codebuff takes a fundamentally different approach:
- Parse your entire codebase — tree-sitter scans all source files and extracts function names, class names, and type names
- Build a code tree — a compact tree of all directories, files, and symbols in your project
- Gemini Flash scans the tree — identifies up to 12 relevant files in seconds
- Gemini Flash summarizes — those 12 files are read and summarized
- Main agent reads multiple files at once — with summaries, it knows exactly what to read
The entire process takes just a few seconds. Codebuff often understands your project better after 2 seconds of scanning than a single-model tool does after 5 minutes of exploration.

### BuffBench: Research-Driven Evals

Codebuff’s development is guided by BuffBench, a custom eval suite that tests agent configurations across 175+ real implementation tasks from open-source repositories. Unlike benchmarks like SWE Bench that pass predefined tests, BuffBench challenges agents to reimplement real git commits through multi-turn conversations. An AI judge scores implementations on completion, efficiency, code quality, and overall correctness — comparing against the ground truth commit.

This data-driven approach means every agent configuration change is measured against real-world performance. Only the highest-scoring, fastest, most cost-effective configurations ship to users.

### Four Modes of Operation

Codebuff provides four modes, switchable mid-session with `Shift+Tab` or `/mode:` commands:
- Default — Standard mode with Claude Opus 4.7. Spawns file pickers and code searchers, uses the editor agent for changes, runs code review, and validates with typechecks and tests.
- Max — Best-of-N selection. Reads 12-20+ files per task, spawns multiple editor agents in parallel with different strategies, and a selector picks the best output. Multiple reviewers analyze from different angles. Runs full-project typechecks and tests.
- Plan — Spec-only mode. Gathers context, asks clarifying questions, and outputs a plan wrapped in `<PLAN>` tags. No file writes. Use to scope work before implementing.
- Lite — Powered by Kimi K2.6. Faster and cheaper for everyday coding tasks.

### FreeBuff: The Free Tier

FreeBuff (`npm install -g freebuff`) is Codebuff’s ad-supported free variant — no subscription, no credits, no configuration. Just install and start coding. It uses models optimized for fast, high-quality assistance and includes built-in web research and browser capabilities. Ads appear above the input box, and each impression earns you credits you can spend on more usage. Turn ads off at any time in settings.

### SDK for Production Integration

Codebuff’s agent framework is exposed through the `@codebuff/sdk` npm package, letting you embed coding agent capabilities into your own applications. The same code that powers Codebuff powers your custom agents:

```
`import { CodebuffClient } from '@codebuff/sdk'

const client = new CodebuffClient({
apiKey: 'your-api-key',
cwd: '/path/to/your/project',
onError: (error) => console.error('Codebuff error:', error.message),
})

// Run a coding task
const result = await client.run({
agent: 'base',
prompt: 'Add error handling to all API endpoints',
handleEvent: (event) => {
console.log('Progress', event)
},
})`
```

You can define custom agents with TypeScript generators, create custom tools, and integrate with CI/CD pipelines.

### Custom Agent Framework

Codebuff provides a full framework for creating and publishing your own agents. Running `/init` inside the CLI generates a project structure with agent definition files, TypeScript type definitions, and tool configurations. Agents are defined as TypeScript objects with:
- id and displayName for identification
- model selection (any model on OpenRouter)
- toolNames for allowed tool access
- instructionsPrompt for system instructions
- handleSteps() generator function for programmatic control
Agents can compose other published agents from the Agent Store at codebuff.com/store , creating reusable, composable workflows.

### Invisible Context Management

Codebuff eliminates context window anxiety. After the prompt cache expires (5 minutes idle), the conversation is automatically compacted into non-lossy summaries that preserve 10-20 roundtrips with full details. After compaction, Codebuff re-reads any relevant files it needs. You never think about context limits — it just works.

## Architecture — How the Multi-Agent System Works

Codebuff runs as a three-tier architecture: the CLI client, a stateless server, and the model providers.

The Pipeline:
- Project Analysis — tree-sitter scans your repository and builds a code map of all files, functions, classes, and types. This happens in ~2 seconds for most projects.
- File Discovery — The File Picker agent (Gemini 2.0 Flash) receives the code tree and identifies up to 12 relevant files. Gemini Flash (3.1 Flash Lite) reads and summarizes them. This replaces the slow, sequential grep-based approach used by other tools.
- Problem Analysis — If the task is complex, the orchestrator spawns a Thinker agent (Claude Opus 4.7 or GPT-5.4) to work through the problem architecture before any code is written.
- Code Editing — Editor agents (Claude Opus 4.7, GPT-5.1, Kimi K2.6) make precise, surgical edits. In Max mode, multiple editors run in parallel with different strategies, sharing the cached conversation history — you only pay once for reading files.
- Review & Validation — A Reviewer agent automatically spawns to catch bugs, dead code, and quality issues. The Basher agent runs terminal commands, typechecks, and tests. In Max mode, multiple reviewers analyze code from different angles.
- Result — The final, reviewed, and tested code is presented to you.
The server is stateless — it streams requests to model providers (Anthropic, OpenAI, Google, xAI) over WebSockets. Your code stays local; only relevant context is sent to the APIs.

Key architectural innovation: Subagents can optionally inherit conversation history from their parent. Unlike Claude Code’s subagents (which always start with blank context), Codebuff agents can pick up where their parent left off. Combined with arbitrary nesting depth and the orchestrator pattern (an agent whose only tool is spawning other agents), this creates a uniquely flexible architecture.

## Installation & Setup

### Prerequisites
- Node.js (includes npm) — Download from nodejs.org
- A project directory you want Codebuff to work on

### Install Codebuff

```
`npm install -g codebuff

# Verify installation
codebuff --version`
```

### Install FreeBuff (free tier)

```
`# No subscription, no credits, no configuration
npm install -g freebuff`
```

### Install the SDK (for programmatic use)

```
`# Install as a dependency in your project
npm install @codebuff/sdk`
```

### Quick Start

```
`# Navigate to your project
cd /path/to/your-project

# Launch Codebuff
codebuff

# On first launch, you'll be guided through authentication
# Then just describe what you want to build`
```

### Initialize Project Context (Optional)

```
`# Inside Codebuff's CLI, run:
/init`
```

This creates project-specific configuration files including `knowledge.md` (project context for Codebuff) and the `.agents/` directory structure for custom agent definitions.

## Usage & Commands

### Starting Codebuff

```
`# Launch in the current directory
codebuff

# Launch with a specific mode
codebuff --mode max

# Launch with debug logging
codebuff --debug`
```

### Key Controls

Action

Input

Switch modes

`Shift+Tab` or `/mode:default`, `/mode:max`, `/mode:plan`, `/mode:lite`

Initialize project

`/init`

Suggest follow-ups

Click on suggested prompts after each response

### Example Prompts

Once inside Codebuff, just describe what you want in natural language:

```
`> "Add authentication to my API"
> "Fix the SQL injection vulnerability in user registration"
> "Add rate limiting to all API endpoints"
> "Refactor the database connection code for better performance"
> "Convert the entire codebase from JavaScript to TypeScript"
> "Set up a CI/CD pipeline with GitHub Actions"`
```

Codebuff handles the rest — file discovery, planning, editing, running tests, and reviewing.

### Working with Modes

Switch modes mid-session depending on the task:
- `/mode:plan` — “What’s the best way to add WebSocket support to this app?” (no code changes)
- `/mode:max` — “Refactor the entire payment processing pipeline” (best-of-N editing)
- `/mode:lite` — “Fix this typo in the error message” (fast and cheap)
- `/mode:default` — Back to standard mode for general development

### Using FreeBuff

```
`# Just install and run
npm install -g freebuff
cd your-project
freebuff`
```

FreeBuff works identically to Codebuff but uses more affordable models and shows contextual ads above the input box.

## Comparison

Codebuff occupies a unique position in the coding agent landscape, differentiated by its multi-agent architecture and research-driven approach.

Dimension

Codebuff

Claude Code

Aider

Cursor

Architecture

Multi-agent orchestration

Single-model + sub-processes

Single-model

Single-model

File Discovery

Tree-based (~2s full scan)

Sequential grep + read

Manual file specification

Editor-integrated

Code Review

Automatic per-prompt

None

None

None

Max Mode

Best-of-N parallel editors

N/A

N/A

Composer

Model Choice

Any OpenRouter model

Claude only

Any (via config)

Claude + GPT + Custom

IDE Integration

CLI (works in any terminal)

CLI

CLI / VS Code plugin

Full IDE

Custom Agents

Full TypeScript framework

Basic sub-agent support

Limited

Limited

Pricing

$100/mo or 1¢/credit + free tier

$20/mo Pro + API costs

Free (BYO keys)

$20/mo Pro

SDK

✅ `@codebuff/sdk`

❌

❌

❌

Open Source

✅ Apache-2.0

❌ Proprietary

✅ Apache-2.0

❌ Proprietary

Evals

BuffBench (175+ tasks)

SWE-Bench

SWE-Bench

Internal

### Codebuff vs Claude Code

Codebuff’s direct benchmark comparison shows meaningful advantages across the board:
- Win rate: 61% Codebuff vs 53% Claude Code on BuffBench
- Speed: ~100 seconds faster per task on average; real-world features completed in 1/3 the time
- Code review: Automatic review after every change (Claude Code has none)
- Model flexibility: Any model on OpenRouter vs locked into Anthropic
- Custom agents: Full TypeScript SDK with programmatic control vs basic sub-agent support
Choose Codebuff over Claude Code when you want faster edits, lower cost per task, automatic code review, and the ability to define custom agent workflows. Choose Claude Code when you need enterprise controls (SSO, RBAC, compliance programs) or direct Anthropic procurement.

### Codebuff vs Aider

Codebuff and Aider both run in the terminal and support multi-model backends, but diverge significantly:
- Architecture: Codebuff uses multi-agent orchestration; Aider uses a single model with edit formats
- File handling: Codebuff automatically discovers relevant files via tree scanning; Aider requires you to specify which files to add to the chat
- Review: Codebuff reviews every change automatically; Aider has no built-in review
- Customization: Codebuff’s TypeScript agent framework is far more flexible than Aider’s edit formats
Choose Codebuff for complex, multi-file refactoring tasks where automatic file discovery and code review save significant time. Choose Aider for simpler, focused edits where you want to minimize overhead and cost.

### Codebuff vs Cursor

Cursor is a full IDE with AI features; Codebuff is a CLI agent:
- Surface: Codebuff lives in the terminal; Cursor is a VS Code fork with integrated AI
- Architecture: Codebuff’s multi-agent orchestration is more sophisticated than Cursor’s Composer
- Extensibility: Codebuff’s SDK and custom agent framework enable CI/CD and production integration that Cursor can’t match
- Pricing: Codebuff’s free tier (FreeBuff) offers a no-cost entry point; Cursor requires a $20/mo subscription
Choose Codebuff if you prefer terminal-centric workflows, need programmable agents for automation, or want a free tier. Choose Cursor if you want a polished IDE experience with inline completions and visual diff views.

## Conclusion

Codebuff represents a genuine architectural leap in AI coding assistants. Where most tools — Claude Code, Cursor, Aider, GitHub Copilot — rely on a single LLM to handle everything from file discovery to code editing to quality assurance, Codebuff orchestrates a team of specialized agents, each purpose-built for their role.

The results speak for themselves. A 61% win rate against Claude Code on BuffBench, tasks completed 100+ seconds faster on average, automatic code review on every change, and a custom agent framework that lets you define, compose, and publish your own agent workflows. The tree-based file discovery alone — indexing your entire codebase in ~2 seconds — eliminates one of the most frustrating bottlenecks in AI-assisted coding: watching your tool slowly explore your project file by file.

Codebuff isn’t without trade-offs. The multi-agent architecture adds overhead on trivial tasks. The pricing model is more complex than a flat subscription (tiers, credits, ads, and a free tier). There’s no native IDE integration — you use it in a terminal, even if that terminal is inside VS Code or Cursor. And with a smaller community than Claude Code or Copilot, you’ll find fewer tutorials, blog posts, and community extensions.

For developers who work on complex, multi-file projects and want a coding assistant that thinks architecturally rather than operating file-by-file, Codebuff is a compelling choice. The agent framework alone opens up possibilities that single-model tools can’t match — automated refactoring pipelines, CI/CD-integrated code review, custom agents for domain-specific tasks. And with FreeBuff, there’s zero cost to try it.

The broader implication is clear: the future of AI coding assistants isn’t better single models — it’s better orchestration of multiple models working together. Codebuff is betting on that future and, based on the evidence so far, it’s a bet worth watching.

## Version History

v1.0.0
Jun 1, 2025
Initial public launch — multi-agent architecture with Default, Max, Plan, and Lite modes

v0.9.0
May 15, 2025
BuffBench eval suite, FreeBuff free tier, SDK release

v0.8.0
Apr 20, 2025
Tree-sitter based file discovery, multi-agent orchestrator

Best for Developers who want a multi-agent architecture that beats single-model approaches on complex coding tasks

Capability Multi-agent orchestration (File Picker + Planner + Editor + Reviewer + Thinker + Basher) · Beats Claude Code 61% vs 53% on BuffBench · Tree-based file discovery indexes codebase in ~2s · CLI + SDK (@codebuff/sdk) · Four modes: Default, Max, Plan, Lite · FreeBuff ad-supported free tier · YC-backed · Apache-2.0 open source

Runs on CLI · SDK

Signature Snippet

Copy

```
`# Install Codebuff globally
npm install -g codebuff

# Navigate to your project
cd /path/to/your-project

# Launch Codebuff
codebuff

# Example prompts inside Codebuff:
# > "Add authentication to my API"
# > "Fix the SQL injection vulnerability in user registration"
# > "Refactor the database connection code for better performance"

# Switch modes mid-session with Shift+Tab or /mode:max
# > /mode:max
# > "Add rate limiting to all API endpoints"

# Use FreeBuff (free tier, no subscription)
npm install -g freebuff && freebuff`
```

## More in this Space

SO

### Sourcery

Closed source

AI code review platform for the AI era. Automated code reviews, security scanning, and team analytics across GitHub, GitLab, VS Code, and JetBrains. Used by 300,000+ developers.

View profile

WT

### What The Diff

Closed source

AI-powered PR description generator and code review assistant. Automatically writes pull request descriptions, sends stakeholder notifications, creates changelogs, and provides inline code refactoring.

View profile

BA

### Blackbox AI

Closed source

Multi-agent AI coding platform with 12+ agents and 24+ models, featuring Chairman LLM for parallel multi-agent evaluation and end-to-end encrypted inference. Ships across six surfaces: CLI, IDE, Cloud, API, Mobile, and Builder.

View profile