Codebuff

A multi-agent coding assistant that coordinates specialized AI agents to understand, plan, edit, and review your codebase.

Codebuff (YC Fall 2024) Open source Since 2025

Visit Website Repository Docs Download

Codebuff is an open-source, multi-agent coding assistant that coordinates specialized AI sub-agents — File Picker, Planner, Editor, Reviewer, Thinker, and Basher — to understand, plan, edit, and review your codebase from the terminal. Built on a deep agent framework and backed by Y Combinator (Fall 2024), it beats single-model approaches like Claude Code on complex coding tasks, scoring 61% vs 53% across 175+ real-world evals in BuffBench.

+ Pros

Innovative multi-agent architecture with specialized sub-agents (File Picker, Planner, Editor, Reviewer, Thinker, Basher) that work together for superior code understanding and modification
Outperforms Claude Code on BuffBench — 61% vs 53% win rate across 175+ real-world coding tasks from open-source repositories
Tree-based file discovery that indexes the entire codebase in ~2 seconds using tree-sitter, then uses Gemini Flash to identify and summarize relevant files
Flexible pricing with a free ad-supported tier (FreeBuff), usage-based credits at 1¢ each, and subscription plans starting at $100/mo
SDK (@codebuff/sdk) for embedding coding agent capabilities into applications, CI/CD pipelines, and custom workflows
Four operation modes — Default (standard), Max (parallel best-of-N editing), Plan (spec-only, no writes), Lite (Kimi K2.6, fast and cheap)
Custom agent framework with TypeScript generators, agent spawning with arbitrary nesting depth, and inherited context support
Automatic code review after every change — catches bugs, dead code, and quality issues before you see the result
Backed by Y Combinator (W25) with active development and growing community

− Cons

Full-feature access requires $100/mo Strong subscription — FreeBuff tier is limited in model quality and shows ads
Multi-agent orchestration adds latency vs single-model tools on simple tasks (overhead of spawning and coordinating sub-agents)
Smaller community and ecosystem than established alternatives like Claude Code, Cursor, or GitHub Copilot
CLI-focused without native IDE extension — relies on terminal usage inside VS Code or Cursor terminals
Pricing complexity with multiple tiers (subscription, credits, ad-supported free tier) can be confusing to navigate

Pricing

FreeBuff (Free)

Ad-supported free tier. No subscription, no credits, no configuration. Uses optimized models with built-in web research and browser capabilities.

Strong (1x)

$100/mo

Full access to all modes (Default, Max, Plan, Lite) with standard usage limits. Multi-agent orchestration with Claude Opus 4.7, GPT-5.1, Kimi K2.6.

Strong (2.5x)

$200/mo

Higher usage limits for teams and power users.

Strong (7x)

$500/mo

Highest usage tier for heavy usage and teams.

Pay-as-you-go

1¢/credit

500 free credits on signup. Credits consumed based on task complexity. 500 credits ≈ a few hours of coding.

Introduction

Codebuff is an open-source, multi-agent coding assistant that doesn’t just throw one model at your code — it coordinates a team of specialized AI agents to understand, plan, edit, and review your codebase. Launched in June 2025 by a Y Combinator-backed team (W25) and hosted on GitHub under an Apache-2.0 license, Codebuff has quickly amassed over 6,100 stars and 6,700+ commits.

The core insight behind Codebuff is simple but powerful: different parts of a coding task benefit from different models and different agent strategies. Instead of using one LLM for everything — file discovery, planning, editing, reviewing — Codebuff spawns purpose-built agents for each role. A File Picker Agent (powered by Gemini 2.0 Flash) scans your codebase and identifies relevant files. A Planner Agent maps out the changes needed. An Editor Agent (running Claude Opus 4.7, GPT-5.1, or Kimi K2.6) makes precise edits. A Reviewer Agent catches issues before you see the result. And in Max mode, multiple editors run in parallel with different strategies, and a selector picks the best output.

This multi-agent approach doesn’t just sound impressive — it’s validated by BuffBench, Codebuff’s custom eval suite that tests configurations across 175+ real implementation tasks from open-source repos. Codebuff beats Claude Code 61% vs 53% on these evals while often completing tasks 100+ seconds faster on average. In real-world tests, a feature that took Claude Code 19 minutes and 37 seconds was completed by Codebuff in 6 minutes and 45 seconds.

Key Features

Multi-Agent Architecture

Codebuff’s defining feature is its orchestrator-driven multi-agent system. The main orchestrator agent — named “Buffy” and running on Claude Opus 4.7 — reads your prompt, gathers context, and spawns specialized sub-agents:

Agent	Model	Role
File Picker	Gemini 2.0 Flash	Scans codebase, finds relevant files
Code Searcher	—	Grep-style pattern matching
Researcher	Gemini 3.1 Flash Lite	Web and documentation lookup
Thinker	Claude Opus 4.7, GPT-5.4	Works through hard problems
Editor	Claude Opus 4.7, GPT-5.1, Kimi K2.6	Writes and modifies code
Reviewer	Claude Opus 4.7, Kimi K2.6	Catches bugs and style issues
Basher	Gemini 3.1 Flash Lite	Runs terminal commands, tests, typechecks

Each sub-agent has a narrow, focused toolset and purpose. The orchestrator keeps its own context clean by only incorporating the final output from spawned agents. Agents can spawn sub-agents with arbitrary nesting depth — unlike Claude Code, which only supports one level of sub-agents.

Tree-Based File Discovery

Traditional coding agents like Claude Code spend minutes grep-ing and reading file excerpts one at a time. Codebuff takes a fundamentally different approach:

Parse your entire codebase — tree-sitter scans all source files and extracts function names, class names, and type names
Build a code tree — a compact tree of all directories, files, and symbols in your project
Gemini Flash scans the tree — identifies up to 12 relevant files in seconds
Gemini Flash summarizes — those 12 files are read and summarized
Main agent reads multiple files at once — with summaries, it knows exactly what to read

The entire process takes just a few seconds. Codebuff often understands your project better after 2 seconds of scanning than a single-model tool does after 5 minutes of exploration.

BuffBench: Research-Driven Evals

Codebuff’s development is guided by BuffBench, a custom eval suite that tests agent configurations across 175+ real implementation tasks from open-source repositories. Unlike benchmarks like SWE Bench that pass predefined tests, BuffBench challenges agents to reimplement real git commits through multi-turn conversations. An AI judge scores implementations on completion, efficiency, code quality, and overall correctness — comparing against the ground truth commit.

This data-driven approach means every agent configuration change is measured against real-world performance. Only the highest-scoring, fastest, most cost-effective configurations ship to users.

Four Modes of Operation

Codebuff provides four modes, switchable mid-session with Shift+Tab or /mode: commands:

Default — Standard mode with Claude Opus 4.7. Spawns file pickers and code searchers, uses the editor agent for changes, runs code review, and validates with typechecks and tests.
Max — Best-of-N selection. Reads 12-20+ files per task, spawns multiple editor agents in parallel with different strategies, and a selector picks the best output. Multiple reviewers analyze from different angles. Runs full-project typechecks and tests.
Plan — Spec-only mode. Gathers context, asks clarifying questions, and outputs a plan wrapped in <PLAN> tags. No file writes. Use to scope work before implementing.
Lite — Powered by Kimi K2.6. Faster and cheaper for everyday coding tasks.

FreeBuff: The Free Tier

FreeBuff (npm install -g freebuff) is Codebuff’s ad-supported free variant — no subscription, no credits, no configuration. Just install and start coding. It uses models optimized for fast, high-quality assistance and includes built-in web research and browser capabilities. Ads appear above the input box, and each impression earns you credits you can spend on more usage. Turn ads off at any time in settings.

SDK for Production Integration

Codebuff’s agent framework is exposed through the @codebuff/sdk npm package, letting you embed coding agent capabilities into your own applications. The same code that powers Codebuff powers your custom agents:

import { CodebuffClient } from '@codebuff/sdk'

const client = new CodebuffClient({
  apiKey: 'your-api-key',
  cwd: '/path/to/your/project',
  onError: (error) => console.error('Codebuff error:', error.message),
})

// Run a coding task
const result = await client.run({
  agent: 'base',
  prompt: 'Add error handling to all API endpoints',
  handleEvent: (event) => {
    console.log('Progress', event)
  },
})

You can define custom agents with TypeScript generators, create custom tools, and integrate with CI/CD pipelines.

Custom Agent Framework

Codebuff provides a full framework for creating and publishing your own agents. Running /init inside the CLI generates a project structure with agent definition files, TypeScript type definitions, and tool configurations. Agents are defined as TypeScript objects with:

id and displayName for identification
model selection (any model on OpenRouter)
toolNames for allowed tool access
instructionsPrompt for system instructions
handleSteps() generator function for programmatic control

Agents can compose other published agents from the Agent Store at codebuff.com/store , creating reusable, composable workflows.

Invisible Context Management

Codebuff eliminates context window anxiety. After the prompt cache expires (5 minutes idle), the conversation is automatically compacted into non-lossy summaries that preserve 10-20 roundtrips with full details. After compaction, Codebuff re-reads any relevant files it needs. You never think about context limits — it just works.

Architecture — How the Multi-Agent System Works

Codebuff runs as a three-tier architecture: the CLI client, a stateless server, and the model providers.

The Pipeline:

Project Analysis — tree-sitter scans your repository and builds a code map of all files, functions, classes, and types. This happens in ~2 seconds for most projects.
File Discovery — The File Picker agent (Gemini 2.0 Flash) receives the code tree and identifies up to 12 relevant files. Gemini Flash (3.1 Flash Lite) reads and summarizes them. This replaces the slow, sequential grep-based approach used by other tools.
Problem Analysis — If the task is complex, the orchestrator spawns a Thinker agent (Claude Opus 4.7 or GPT-5.4) to work through the problem architecture before any code is written.
Code Editing — Editor agents (Claude Opus 4.7, GPT-5.1, Kimi K2.6) make precise, surgical edits. In Max mode, multiple editors run in parallel with different strategies, sharing the cached conversation history — you only pay once for reading files.
Review & Validation — A Reviewer agent automatically spawns to catch bugs, dead code, and quality issues. The Basher agent runs terminal commands, typechecks, and tests. In Max mode, multiple reviewers analyze code from different angles.
Result — The final, reviewed, and tested code is presented to you.

The server is stateless — it streams requests to model providers (Anthropic, OpenAI, Google, xAI) over WebSockets. Your code stays local; only relevant context is sent to the APIs.

Key architectural innovation: Subagents can optionally inherit conversation history from their parent. Unlike Claude Code’s subagents (which always start with blank context), Codebuff agents can pick up where their parent left off. Combined with arbitrary nesting depth and the orchestrator pattern (an agent whose only tool is spawning other agents), this creates a uniquely flexible architecture.

Installation & Setup

Prerequisites

Node.js (includes npm) — Download from nodejs.org
A project directory you want Codebuff to work on

Install Codebuff

npm install -g codebuff

# Verify installation
codebuff --version

Install FreeBuff (free tier)

# No subscription, no credits, no configuration
npm install -g freebuff

Install the SDK (for programmatic use)

# Install as a dependency in your project
npm install @codebuff/sdk

Quick Start

# Navigate to your project
cd /path/to/your-project

# Launch Codebuff
codebuff

# On first launch, you'll be guided through authentication
# Then just describe what you want to build

Initialize Project Context (Optional)

# Inside Codebuff's CLI, run:
/init

This creates project-specific configuration files including knowledge.md (project context for Codebuff) and the .agents/ directory structure for custom agent definitions.

Usage & Commands

Starting Codebuff

# Launch in the current directory
codebuff

# Launch with a specific mode
codebuff --mode max

# Launch with debug logging
codebuff --debug

Key Controls

Action	Input
Switch modes	`Shift+Tab` or `/mode:default`, `/mode:max`, `/mode:plan`, `/mode:lite`
Initialize project	`/init`
Suggest follow-ups	Click on suggested prompts after each response

Example Prompts

Once inside Codebuff, just describe what you want in natural language:

> "Add authentication to my API"
> "Fix the SQL injection vulnerability in user registration"
> "Add rate limiting to all API endpoints"
> "Refactor the database connection code for better performance"
> "Convert the entire codebase from JavaScript to TypeScript"
> "Set up a CI/CD pipeline with GitHub Actions"

Codebuff handles the rest — file discovery, planning, editing, running tests, and reviewing.

Working with Modes

Switch modes mid-session depending on the task:

/mode:plan — “What’s the best way to add WebSocket support to this app?” (no code changes)
/mode:max — “Refactor the entire payment processing pipeline” (best-of-N editing)
/mode:lite — “Fix this typo in the error message” (fast and cheap)
/mode:default — Back to standard mode for general development

Using FreeBuff

# Just install and run
npm install -g freebuff
cd your-project
freebuff

FreeBuff works identically to Codebuff but uses more affordable models and shows contextual ads above the input box.

Comparison

Codebuff occupies a unique position in the coding agent landscape, differentiated by its multi-agent architecture and research-driven approach.

Dimension	Codebuff	Claude Code	Aider	Cursor
Architecture	Multi-agent orchestration	Single-model + sub-processes	Single-model	Single-model
File Discovery	Tree-based (~2s full scan)	Sequential grep + read	Manual file specification	Editor-integrated
Code Review	Automatic per-prompt	None	None	None
Max Mode	Best-of-N parallel editors	N/A	N/A	Composer
Model Choice	Any OpenRouter model	Claude only	Any (via config)	Claude + GPT + Custom
IDE Integration	CLI (works in any terminal)	CLI	CLI / VS Code plugin	Full IDE
Custom Agents	Full TypeScript framework	Basic sub-agent support	Limited	Limited
Pricing	$100/mo or 1¢/credit + free tier	$20/mo Pro + API costs	Free (BYO keys)	$20/mo Pro
SDK	✅ `@codebuff/sdk`	❌	❌	❌
Open Source	✅ Apache-2.0	❌ Proprietary	✅ Apache-2.0	❌ Proprietary
Evals	BuffBench (175+ tasks)	SWE-Bench	SWE-Bench	Internal

Codebuff vs Claude Code

Codebuff’s direct benchmark comparison shows meaningful advantages across the board:

Win rate: 61% Codebuff vs 53% Claude Code on BuffBench
Speed: ~100 seconds faster per task on average; real-world features completed in 1/3 the time
Code review: Automatic review after every change (Claude Code has none)
Model flexibility: Any model on OpenRouter vs locked into Anthropic
Custom agents: Full TypeScript SDK with programmatic control vs basic sub-agent support

Choose Codebuff over Claude Code when you want faster edits, lower cost per task, automatic code review, and the ability to define custom agent workflows. Choose Claude Code when you need enterprise controls (SSO, RBAC, compliance programs) or direct Anthropic procurement.

Codebuff vs Aider

Codebuff and Aider both run in the terminal and support multi-model backends, but diverge significantly:

Architecture: Codebuff uses multi-agent orchestration; Aider uses a single model with edit formats
File handling: Codebuff automatically discovers relevant files via tree scanning; Aider requires you to specify which files to add to the chat
Review: Codebuff reviews every change automatically; Aider has no built-in review
Customization: Codebuff’s TypeScript agent framework is far more flexible than Aider’s edit formats

Choose Codebuff for complex, multi-file refactoring tasks where automatic file discovery and code review save significant time. Choose Aider for simpler, focused edits where you want to minimize overhead and cost.

Codebuff vs Cursor

Cursor is a full IDE with AI features; Codebuff is a CLI agent:

Surface: Codebuff lives in the terminal; Cursor is a VS Code fork with integrated AI
Architecture: Codebuff’s multi-agent orchestration is more sophisticated than Cursor’s Composer
Extensibility: Codebuff’s SDK and custom agent framework enable CI/CD and production integration that Cursor can’t match
Pricing: Codebuff’s free tier (FreeBuff) offers a no-cost entry point; Cursor requires a $20/mo subscription

Choose Codebuff if you prefer terminal-centric workflows, need programmable agents for automation, or want a free tier. Choose Cursor if you want a polished IDE experience with inline completions and visual diff views.

Conclusion

Codebuff represents a genuine architectural leap in AI coding assistants. Where most tools — Claude Code, Cursor, Aider, GitHub Copilot — rely on a single LLM to handle everything from file discovery to code editing to quality assurance, Codebuff orchestrates a team of specialized agents, each purpose-built for their role.

The results speak for themselves. A 61% win rate against Claude Code on BuffBench, tasks completed 100+ seconds faster on average, automatic code review on every change, and a custom agent framework that lets you define, compose, and publish your own agent workflows. The tree-based file discovery alone — indexing your entire codebase in ~2 seconds — eliminates one of the most frustrating bottlenecks in AI-assisted coding: watching your tool slowly explore your project file by file.

Codebuff isn’t without trade-offs. The multi-agent architecture adds overhead on trivial tasks. The pricing model is more complex than a flat subscription (tiers, credits, ads, and a free tier). There’s no native IDE integration — you use it in a terminal, even if that terminal is inside VS Code or Cursor. And with a smaller community than Claude Code or Copilot, you’ll find fewer tutorials, blog posts, and community extensions.

For developers who work on complex, multi-file projects and want a coding assistant that thinks architecturally rather than operating file-by-file, Codebuff is a compelling choice. The agent framework alone opens up possibilities that single-model tools can’t match — automated refactoring pipelines, CI/CD-integrated code review, custom agents for domain-specific tasks. And with FreeBuff, there’s zero cost to try it.

The broader implication is clear: the future of AI coding assistants isn’t better single models — it’s better orchestration of multiple models working together. Codebuff is betting on that future and, based on the evidence so far, it’s a bet worth watching.

Version History

v1.0.0 Jun 1, 2025

Initial public launch — multi-agent architecture with Default, Max, Plan, and Lite modes

v0.9.0 May 15, 2025

BuffBench eval suite, FreeBuff free tier, SDK release

v0.8.0 Apr 20, 2025

Tree-sitter based file discovery, multi-agent orchestrator

Signature Snippet

# Install Codebuff globally
npm install -g codebuff

# Navigate to your project
cd /path/to/your-project

# Launch Codebuff
codebuff

# Example prompts inside Codebuff:
# > "Add authentication to my API"
# > "Fix the SQL injection vulnerability in user registration"
# > "Refactor the database connection code for better performance"

# Switch modes mid-session with Shift+Tab or /mode:max
# > /mode:max
# > "Add rate limiting to all API endpoints"

# Use FreeBuff (free tier, no subscription)
npm install -g freebuff && freebuff

More in this Space

Vix

Open source

Vix is a Go-native, open-source (AGPL-3.0) AI coding agent that slashes token costs by 40-50% using a stem agent architecture and Tree-sitter virtual filesystem. It rethinks the plan/execute loop — keeping LLM cache warm across Explore, Plan, and Execute phases — while shipping Programmable Workflows, Whiteboard Mode with voice AI, MCP server support, and a self-evolving agent that writes its own scheduled jobs and watchers.

Late — High-Leverage AI Agent Orchestration

Closed source

Orchestrate an entire AI dev team on 5GB VRAM using ephemeral subagents, exact-match diffs, and a zero-dependency Go binary. Works with any OpenAI-compatible model — local or cloud.

Paca