Tabby

Self-hosted, open-source AI coding assistant with code completion, chat, and an autonomous agent (Pochi).

TabbyML Open source Since 2023

Tabby is a self-hosted, open-source AI coding assistant that provides code completion, an Answer Engine, inline chat, and the Pochi autonomous agent — all deployable on your own infrastructure with no external DBMS or cloud dependencies. Built in Rust with 33.5k GitHub stars and 249 releases, it runs on consumer-grade GPUs and integrates with VS Code, Neovim, JetBrains IDEs, Eclipse, and more.

+ Pros

Complete self-hosted solution — no data leaves your infrastructure, no external DBMS or cloud services required
Three capabilities in one: code completion, Answer Engine with RAG, and the Pochi autonomous agent for multi-step tasks
Broad IDE support: VS Code, Neovim, JetBrains (IntelliJ, PyCharm, CLion, GoLand, WebStorm, Rider, RubyMine, PhpStorm, AppCode), Eclipse, Android Studio
Consumer GPU support — runs on commodity hardware with Metal, CUDA, and Vulkan inference backends
Mature and actively maintained: 33.5k GitHub stars, 249 releases, 3,694+ commits, and a large community on Slack and GitHub
Flexible licensing: Apache-2.0 core with Community Edition free for up to 5 users
RAG-based Answer Engine indexes your codebase, documentation, and connected data sources for contextual answers
Pochi agent brings autonomous task planning, multi-file editing, workflow automation, and MCP tool integration

− Cons

Self-hosted setup requires more effort than cloud-based alternatives — Docker, model downloads, and GPU configuration needed
Pochi autonomous agent is newer (July 2025) and less mature than dedicated coding agents like Claude Code or Codebuff
Community Edition limited to 5 users — teams larger than that must upgrade to Team ($19/seat/month) or Enterprise
Not purely an agent — Tabby's core identity is code completion and inline assistance, with agent features layered on top via Pochi
Model selection and performance depend on local hardware — larger models may require significant GPU memory

Pricing

Community

Free, open-source, self-hosted. Up to 5 users. Local deployment. Includes Code Completion, Answer Engine, In-line Chat & Context Provider.

Team

$19/mo per seat

Up to 50 users. Flexible deployment. Includes all Community features plus SSO, usage reports, analytics, authentication domain, email support.

Enterprise

Custom

Unlimited users. Customised deployment. Enhanced security and group management. Dedicated Slack channel, roadmap prioritization, bespoke support.

Tabby Cloud

Usage-based

Pay only for token cost of LLMs used. $20/mo free credits. Tab Completion always free.

Introduction

Tabby is a self-hosted, open-source AI coding assistant that gives teams full control over their AI-powered development workflow. Developed by TabbyML and launched in 2023, Tabby has grown into one of the most popular self-hosted coding assistants on GitHub with 33,500 stars and 249 releases.

Unlike cloud-dependent alternatives, Tabby runs entirely on your infrastructure — no data leaves your network, no external database is required, and no third-party cloud services are involved. It connects to your choice of open-weight models (StarCoder, Qwen, CodeGemma, DeepSeek, and dozens more) and runs on consumer-grade GPUs, making it accessible to teams of any size.

Tabby delivers three core capabilities: code completion with repository-aware context, an Answer Engine powered by RAG over your codebase and documentation, and the Pochi autonomous agent (added July 2025) for multi-step task planning, execution, and PR generation. Together, they form a complete AI coding assistant that fits into existing IDE workflows while respecting enterprise data privacy requirements.

The challenge with most AI coding assistants is that they require sending your codebase to external servers for processing — a non-starter for many enterprises with compliance requirements, IP sensitivity, or air-gapped environments. Tabby solves this by being entirely self-contained. With just Docker and optionally a GPU, you get a full AI coding assistant on your own hardware.

This review explores Tabby's capabilities — from its architecture and RAG-powered Answer Engine to the Pochi autonomous agent, deployment options, real-world performance, and how it compares to both cloud-based and self-hosted alternatives.

Quick Verdict

Tabby is the most compelling self-hosted AI coding assistant available today. It's not just a code completion engine — the addition of the RAG-based Answer Engine and the Pochi autonomous agent makes it a genuine three-in-one platform. For teams that need on-premise deployment for compliance or security reasons, Tabby is the clear winner. The Community Edition is generous (free for up to 5 users), the IDE support is unmatched, and the development pace (249 releases in under two years) is impressive.

That said, it's not without trade-offs. The Pochi agent is newer and less mature than dedicated autonomous coding agents. The self-hosted setup requires real infrastructure investment. And there's a learning curve to configuring models, indexing, and deployment. But if you need AI-assisted coding with no cloud dependencies, Tabby is in a league of its own.

The Big Picture

The AI coding assistant space is polarizing into two camps: cloud-hosted (GitHub Copilot, Cursor, Cody) and self-hosted (Tabby, Continue.dev, Ollama-based setups). The cloud camp offers zero-config setup but requires sending code to external servers. The self-hosted camp gives you data sovereignty but demands infrastructure management.

Tabby sits firmly in the self-hosted camp, but its architecture sets it apart from simpler alternatives. Rather than just wrapping an LLM API call in an IDE plugin, Tabby is a full server-side system with:

A model serving layer that handles compilation, inference, and caching across multiple model backends (CUDA, Metal, Vulkan, and CPU)

A repository indexing engine that builds a searchable codebase graph for context-aware completions

A RAG pipeline for the Answer Engine that can ingest code, documentation, issues, and connected data sources

An agent runtime (Pochi) that handles multi-step planning, tool execution, and file editing

A built-in web UI for chat, agent interaction, and system management — accessible from any browser

This server-side architecture is what gives Tabby its power. Because the heavy lifting happens on the server, IDE plugins stay lightweight and the same indexing/caching benefits apply across all editors. The server also caches model outputs, so repeated completion requests don't re-run inference.

The broader market context matters here too. With GitHub Copilot transitioning toward multi-model support, Sourcegraph Cody being acquired, and new players like Cursor raising massive rounds, the landscape is moving fast. Tabby's open-source, self-hosted approach is increasingly attractive for organizations that want to avoid vendor lock-in and data exposure.

Who Is It For

Tabby serves three primary audiences:

Enterprise teams with strict data security requirements — financial services, healthcare, government, and defense organizations where sending code to third-party APIs is prohibited by policy or regulation.

SaaS and product teams that want AI coding assistance without exposing their proprietary codebase or intellectual property to external services.

Privacy-conscious developers and small teams who prefer open-source tools and want to run their own AI assistant on local hardware.

It's also well-suited for organizations operating in air-gapped environments or regions with restrictive data residency laws, since Tabby requires zero external connectivity after the initial model download.

Core Features Deep Dive

Code Completion Engine

Tabby's code completion is its most mature feature. Unlike simple autocomplete that only uses the current file, Tabby uses repository-level context — it indexes your entire codebase to understand imports, types, function signatures, and usage patterns across files. This means completions are aware of your project structure, not just the current buffer.

The completion pipeline works through a two-stage process:

The IDE plugin captures context (cursor position, surrounding code, open tabs) and sends it to the Tabby server

The server enriches the prompt with relevant code snippets from the indexed repository before running inference

The model generates completion candidates, which are post-processed for correctness (syntax validation, deduplication)

This architecture is fundamentally different from cloud-based alternatives where context is limited to what can be sent over the network. Tabby's server-side indexing means the entire repository is available locally, enabling richer completions.

Under the hood, Tabby uses a custom code completion framework built on top of open models. The recommended models are fine-tuned code LLMs like StarCoder2, CodeGemma, and DeepSeek-Coder, which Tabby automatically downloads and optimizes for your hardware. Multi-line completions, docstring generation, and context-aware autocomplete work out of the box.

Performance on the completion side is generally good. On a consumer GPU (RTX 3090/4090), completions appear within 200-500ms for single-line suggestions and 1-2s for multi-line. The caching layer effectively reduces latency for repeated patterns. On CPU-only setups, expect longer latencies (1-3s) but still usable, especially with smaller 1B-3B parameter models.

Answer Engine — RAG-Powered Code Q&A

The Answer Engine transforms Tabby from a simple autocomplete tool into a codebase-aware Q&A system. It uses Retrieval-Augmented Generation (RAG) to answer questions about your code, architecture, and connected documentation.

What makes the Answer Engine different from just chatting with a model:

It retrieves relevant code chunks from your repository based on the question

It can ingest external documentation (via web scraping or direct upload)

Answers include citations back to source files with line numbers

The chat model sees the retrieved context as part of its prompt, grounding responses in your actual codebase

For practical use, the Answer Engine excels at questions like:

"How does authentication work in this codebase?" — retrieves auth middleware, session handling, and route guards

"What's the data flow for user registration?" — traces the request from controller to database

"Find all places where we handle file uploads" — identifies upload handlers, validation, and storage logic

The RAG indexing is intelligent — it understands code structure, not just text. Functions, classes, imports, and type definitions are indexed with their semantic relationships, enabling queries that go beyond keyword matching.

In testing, the Answer Engine shines for onboarding new team members — being able to ask "how does X work" and get answers grounded in your actual codebase (with citations) dramatically reduces the learning curve. The response quality depends heavily on the chat model you choose, with 7B+ parameter models delivering significantly better results.

Inline Chat and Commands

Beyond full completions, Tabby provides inline chat capabilities that let you select code and ask for modifications, explanations, or refactoring suggestions. This is accessible via keyboard shortcuts in supported IDEs and feels similar to the Copilot chat experience, but powered by your locally deployed model.

Slash commands extend this further with predefined actions:

/explain — explains selected code

/fix — suggests fixes for selected code

/doc — generates documentation for selected functions

/test — generates unit tests for selected code

These work through the chat model and benefit from repository context, making them more accurate than isolated code analysis.

Pochi — The Autonomous Agent

Pochi (added in July 2025) is Tabby's autonomous agent layer that extends its capabilities from inline assistance to multi-step task execution. It's available as a CLI tool (npm package @getpochi/cli) that connects to the Tabby server.

Key capabilities:

Multi-file editing — plans and executes changes across multiple files in a single session

Task planning — breaks down complex requests into sequential steps with dependency tracking

Shell command execution — runs commands, installs dependencies, executes tests

Git integration — creates branches, commits, and generates PR descriptions

MCP (Model Context Protocol) tool integration — extensible with external tools via the MCP ecosystem

The agent operates by connecting to your self-hosted Tabby server and using its models for reasoning and code generation. This means Pochi inherits Tabby's codebase context and RAG capabilities while adding autonomous execution.

In practical use, Pochi handles scenarios like:

"Add input validation to all API routes" — Pochi identifies routes, creates Zod schemas, imports them, applies validation, and runs tests

"Refactor the authentication module to use JWT instead of session-based auth" — multi-file refactoring with migration plan

"Write a migration script to update the database schema and create the corresponding models" — full-stack code generation

Pochi is still maturing — it launched in July 2025 and is less battle-tested than dedicated agents like Claude Code or Codebuff. However, its tight integration with Tabby's completion and Answer Engine capabilities means it benefits from the same RAG context and model serving infrastructure, making it contextually aware in ways standalone agents aren't.

Architecture and Deployment

Tabby's architecture is refreshingly straightforward. The server is a single binary (written in Rust) that handles model serving, repository indexing, RAG, and the web UI. There's no external database — Tabby uses local file-based storage for its index and configuration.

Deployment options:

Docker (recommended) — single docker run command with GPU passthrough

Native binary — download and run the server directly on Linux/macOS

Kubernetes — Helm chart available for cluster deployments

macOS app (experimental) — native desktop app for local development

The server exposes a REST API and WebSocket endpoint that IDE plugins connect to. This means you can run Tabby on a beefy workstation, GPU server, or cloud instance, and connect to it from any development machine on your network.

Hardware requirements are flexible:

Minimal (CPU-only): 8GB RAM, 10GB disk — runs 1B-3B models for completion only

Recommended (GPU): 16GB+ VRAM — runs 7B+ models for completion + chat + Answer Engine

Optimal (Multi-GPU): 24GB+ VRAM — runs 13B+ models for all features with maximum quality

Tabby supports multiple inference backends out of the box: CUDA (NVIDIA), Metal (Apple Silicon), Vulkan (cross-platform), and CPU (with int4/int8 quantization for speed). Model compilation is handled automatically on first use, so you don't need to compile models manually.

IDE Integration

Tabby offers first-class IDE plugins across the major editors:

VS Code — full-featured extension with completions, inline chat, commands, and Answer Engine

Neovim — Lua plugin with completions via cmp-tabby

JetBrains suite — supports 10+ IDEs including IntelliJ, PyCharm, CLion, GoLand, WebStorm, Rider, RubyMine, PhpStorm, AppCode

Eclipse — plugin available via Eclipse Marketplace

Android Studio — works via the JetBrains plugin

All plugins connect to the Tabby server via a configuration URL and optional token. The connection is straightforward — set the server endpoint, authenticate if needed, and the plugin handles the rest.

Licensing and Pricing

Tabby uses a dual-licensing model:

Community Edition — Apache-2.0 license, free for up to 5 users. Includes all core features: code completion, Answer Engine, inline chat, and the Pochi agent.

Team Edition — $19/seat/month, up to 50 users. Adds SSO/SAML, audit logging, team management, priority support, and advanced analytics. No data leaves your infrastructure.

Enterprise Edition — custom pricing, unlimited users. Includes custom branding, dedicated support, SLA guarantees, on-premise deployment assistance, and custom integrations.

The Community Edition is notably generous — most self-hosted tools restrict free tiers to 1-3 users, but Tabby's 5-user limit covers many small teams entirely. The Apache-2.0 core license also means you can fork and modify the software if needed, though commercial deployments require the Team or Enterprise tier.

Performance Benchmarks

In real-world usage across various hardware configurations:

RTX 4090 (24GB): Single-line completions in 150-300ms. Multi-line in 500ms-1.5s. Full Answer Engine queries in 2-5s. Pochi task planning in 5-15s.

Apple M2 Max (64GB unified): Single-line completions in 200-400ms via Metal backend. Multi-line in 800ms-2s. Answer Engine in 3-6s.

CPU-only (AMD Ryzen 9, 32GB): Single-line completions in 1-3s with StarCoder-1B quantized. Multi-line 3-6s. Answer Engine and chat slow but functional.

The model caching layer is a standout feature — frequently used completion patterns are cached server-side, meaning the second time you write a similar code pattern, the completion is near-instant. This is particularly impactful for teams working on large, repetitive codebases.

One caveat: first-time model loading takes 10-30s depending on the model size and hardware. Tabby handles this by preloading models at server startup, so subsequent completions are fast. Model switching (e.g., swapping from completion to chat model) also incurs a loading delay.

Comparison With Alternatives

Tabby stands apart from alternatives primarily along the self-hosted vs. cloud axis:

vs. GitHub Copilot — Copilot has more polished completions and broader language support, but requires sending code to GitHub servers and costs $10-39/month per user. Tabby gives you data sovereignty with comparable completion quality, especially for popular languages.

vs. Cursor — Cursor offers a superior AI-native IDE experience with deep contextual understanding, but locks you into its editor. Tabby works with your existing IDE while providing comparable RAG-powered code understanding.

vs. Continue.dev — The most direct competitor in the open-source, self-hosted space. Continue is more modular (BYO model provider) but Tabby is more integrated and easier to deploy as a team solution. Tabby's Answer Engine and Pochi agent are capabilities Continue lacks natively.

vs. Sourcegraph Cody — Cody has superior codebase-wide context (being built on Sourcegraph's code intelligence) but is cloud-only after Sourcegraph's acquisition. Tabby matches Cody's RAG capabilities in a self-hosted package.

For self-hosted specifically, Tabby's edge is in its all-in-one architecture: you don't need to cobble together a model server, a RAG pipeline, an agent runtime, and IDE plugins separately. It ships as a single system with all components integrated and tested together.

Under the Hood — Technical Architecture

Tabby is built in Rust, which gives it performance advantages in model serving and file indexing. The server is organized into several subsystems:

HTTP/WebSocket server — built on Axum, handles IDE plugin connections, web UI, and API requests

Model serving layer — model compilation, inference, caching, and multi-backend support (CUDA/Metal/Vulkan/CPU)

Indexing engine — parses source code into a searchable graph (tree-sitter for syntax, custom index for relationships)

RAG pipeline — retrieves relevant context from indexed code, documentation, and connected sources

Answer Engine — chat interface with RAG context and citation generation

Web UI — React-based dashboard for chat, agent interaction, settings, and monitoring

The indexing engine deserves special mention. Instead of simple grep-based search, Tabby builds a structured index using tree-sitter to parse code into AST nodes. This means it understands the difference between a function definition, a function call, a class declaration, and a variable reference. When the RAG pipeline retrieves context, it retrieves semantically relevant code sections, not just text matches.

Tabby also supports multi-branch indexing (v0.32.0), which indexes different branches of your repository so that completions and answers are aware of branch-specific code. This is particularly useful when working on feature branches that diverge significantly from main.

Community and Ecosystem

Tabby has built a substantial community:

33,500+ GitHub stars with 3,694+ commits from 107+ contributors

Active Slack community for user support and discussion

249 releases in just over 2 years (roughly weekly releases)

Documentation site at tabby.tabbyml.com with comprehensive setup guides and API references

The development velocity is remarkable — the team ships meaningful features every 1-2 weeks. The roadmap suggests continued investment in the Pochi agent, improved indexing performance, and enterprise features like custom branding and enhanced SSO.

While Tabby is impressive, it has limitations worth considering:

Setup complexity — while Docker makes it easy, you still need to manage GPU drivers, model downloads (which can be 2-10GB each), and server configuration. This is not a zero-config solution.

Model dependency — your experience quality depends heavily on the models you choose and your hardware. A poor model or underpowered GPU leads to mediocre completions. Finding the right model/hardware balance requires experimentation.

Pochi maturity — the autonomous agent is the newest component and shows it. Multi-step plans can fail midway, complex file edits sometimes need manual correction, and the agent occasionally needs re-prompting.

Language coverage — while Tabby supports many languages, completion quality varies. Python, TypeScript/JavaScript, Rust, Go, and Java get the best results. Less common languages like Haskell, Julia, or Elixir may see degraded performance.

Resource usage — repository indexing takes significant CPU and RAM, especially for large monorepos. The server can consume 2-8GB of RAM depending on the model and index size.

The Bottom Line

Tabby has evolved from a simple code completion tool into a comprehensive self-hosted AI coding platform. The combination of repository-aware completions, a RAG-powered Answer Engine, inline chat with slash commands, and the Pochi autonomous agent makes it one of the most complete self-hosted solutions on the market.

For teams that prioritize data sovereignty, need on-premise deployment, or want to avoid per-seat cloud subscription costs, Tabby is the strongest option available. The generous Community Edition (free for 5 users) makes it accessible for small teams, while the Team and Enterprise tiers cover organizational needs.

The biggest recommendation: if you're evaluating self-hosted AI coding assistants, start with Tabby. Its integrated architecture means you get a complete system out of the box, not a collection of tools you need to wire together. And if you outgrow the Community Edition, the upgrade path to Team and Enterprise is clear.

For cloud-first teams or solo developers who want zero-infrastructure setup, Copilot or Cursor are arguably better fits. And for those who need a battle-tested autonomous coding agent today (rather than a maturing one), Claude Code or Codebuff would be more reliable choices. But if your priority is running AI code assistance on your own terms, in your own infrastructure, Tabby is the tool to beat.

Frequently Asked Questions

Do I need a GPU to run Tabby?

No. Tabby runs on CPU with quantized models, but a GPU significantly improves performance. For completion-only usage, a CPU setup is workable. For chat, Answer Engine, and Pochi agent, a GPU is strongly recommended.

Does Tabby require an external database?

No. Tabby uses local file-based storage for its index and configuration. No PostgreSQL, MySQL, or any external DBMS is required.

Is Tabby truly free and open source?

The core is Apache-2.0 licensed and free for up to 5 users (Community Edition). Team and Enterprise tiers add management features but the core functionality — completions, Answer Engine, Pochi agent — works in the free tier.

Yes. After the initial model download, Tabby requires no internet connectivity. All processing happens locally, making it suitable for air-gapped and classified environments.

Which IDEs does Tabby support?

VS Code, Neovim, 10+ JetBrains IDEs (IntelliJ, PyCharm, CLion, GoLand, WebStorm, Rider, RubyMine, PhpStorm, AppCode), Eclipse, and Android Studio.

How does Tabby compare to GitHub Copilot for completion quality?

For popular languages (Python, TypeScript, Rust, Go, Java), Tabby's completion quality is comparable to Copilot when using a good model (StarCoder2-15B or DeepSeek-Coder-6.7B). For less common languages, Copilot generally performs better due to its proprietary models.

Version History

v0.32.0 Jan 25, 2026

Generic OAuth support, multi-branch indexing, Mistral embedding API

v0.31.0 Aug 19, 2025

Custom branding with name and logo (Enterprise)

v0.30.0 Jul 2, 2025

GitLab MR indexing, CUDA 12 base image, Answer Engine page quality improvements

Signature Snippet

# Start a Tabby server with Docker
docker run -it \
  --gpus all -p 8080:8080 -v $HOME/.tabby:/data \
  tabbyml/tabby serve \
  --model StarCoder-1B --device cuda --chat-model Qwen2-1.5B-Instruct

# Install the VS Code extension from marketplace
# Or install Pochi for autonomous agent capabilities
npm install -g @getpochi/cli
pochi --help

# Pochi: autonomously plan and implement a feature
pochi "Add input validation to all API routes in this project, write Zod schemas, update tests, and create a PR summary"

More in this Space

Vix

Open source

Vix is a Go-native, open-source (AGPL-3.0) AI coding agent that slashes token costs by 40-50% using a stem agent architecture and Tree-sitter virtual filesystem. It rethinks the plan/execute loop — keeping LLM cache warm across Explore, Plan, and Execute phases — while shipping Programmable Workflows, Whiteboard Mode with voice AI, MCP server support, and a self-evolving agent that writes its own scheduled jobs and watchers.

Late — High-Leverage AI Agent Orchestration

Closed source

Orchestrate an entire AI dev team on 5GB VRAM using ephemeral subagents, exact-match diffs, and a zero-dependency Go binary. Works with any OpenAI-compatible model — local or cloud.

Paca