Free (Builders)
Use freely to write code for any project, including commercial ones. Your output is yours.
Orchestrate an entire AI dev team on 5GB VRAM
Orchestrate an entire AI dev team on 5GB VRAM using ephemeral subagents, exact-match diffs, and a zero-dependency Go binary. Works with any OpenAI-compatible model — local or cloud.
Use freely to write code for any project, including commercial ones. Your output is yours.
Required if wrapping Late's orchestration engine into a paid service or deploying as enterprise infrastructure
Most AI coding agents share a fatal design flaw: they do everything in one context window. Planning, implementation, retries, self-healing, error recovery — it all piles into the same conversation history until the model can't tell signal from noise. You blame the model. The model is fine.
Late takes a different approach. It is a zero-dependency, single-static-binary AI coding agent written in Go that splits the workflow into two layers: a lean orchestrator that handles planning and coordination, and ephemeral subagents that execute individual tasks in isolated contexts. Each subagent gets one job, a fresh context, and is destroyed when done. The orchestrator only ever sees plans and outcomes, never the implementation mess.
The result is that the same model feels sharper in Late because it reasons from a clean signal. Late manages its KV cache ruthlessly, keeps its system prompt around 1,000 tokens, and runs comfortably on 5GB VRAM for local inference. No Node.js, no Python, no runtime dependencies — just a single Go binary you drop in your PATH.
Late's architecture rests on one insight: separation of concerns between planning and execution. When you run late in a project directory, the orchestrator reads your codebase, forms a plan, and spawns subagents one at a time. Each subagent receives a self-contained task with all the context it needs — and nothing more.
The orchestrator's system prompt is deliberately lean at roughly 1,000 tokens. It does not accumulate edit history, retry logs, or implementation details. Its context grows only from your instructions and its own planning decisions. Everything the subagent did to produce a result — the failed attempts, the search/replace iterations, the intermediate reasoning — is destroyed with the subagent.
Subagents communicate back to the orchestrator through a strict protocol: exact-match diffs using search/replace blocks with autonomous self-healing. If a diff cannot be applied cleanly, the failure is reported back rather than silently patched. The orchestrator can then retry with a fresh subagent, armed with better context from the failure report.
This architecture enables hybrid model routing. You can architect the plan with a large reasoning model like DeepSeek V4, then spawn subagents that execute using cheaper, faster local models like Gemma 4. The orchestrator handles the expensive reasoning; subagents handle the grunt work.
LATE_SUBAGENT_MODEL or config.json and move on.search/replace blocks with autonomous self-healing on mismatch. Edits fail loud — no silent corruption, no fuzzy patching, no surprises.[y/N]. Session, project, and global trust scopes with TTL decay so you are not spammed every time.PATH, run.OPENAI_BASE_URL, OPENAI_API_KEY, OPENAI_MODEL. You are running.Local-first development is Late's sweet spot. Got a machine with 5GB+ VRAM and llama.cpp running? Late works out of the box — zero configuration. No cloud API keys, no OAuth flows, no subscription. For developers who want AI assistance without shipping code to third-party servers.
Hybrid cloud/local workflows let you use a powerful cloud model like DeepSeek V4 or Claude for architecture planning while executing edits with a local model. You get the reasoning quality of a large model with the latency and privacy of local execution for the implementation work.
Resource-constrained environments benefit from Late's tiny footprint. The binary is a few megabytes, no runtime dependencies, and the orchestrator itself consumes negligible context. This makes it viable on low-spec machines, CI runners, and edge devices where a full Node.js-based agent would be impractical.
Parallel development across branches is supported through Git worktree integration. Run multiple Late instances on different branches simultaneously without context pollution.
Late is free for builders. Use it to write code for any project, including commercial ones. The output you produce with Late is yours.
The license includes a commercial restriction: you may not monetize Late itself by wrapping the orchestration engine into a paid service or deploying it as enterprise infrastructure without a separate commercial agreement.
The license converts to GPLv2 on February 21, 2030. Full terms are in the LICENSE file in the repository.
Pros
Cons
Upgraded TUI theme, double-click viewport copy, keyboard help overlay (Ctrl+H), terminal background color leak fix
MCP tool namespace fix, raised MCP truncation limit to 32,768 chars, UTF-8 slicing fix
Sqz tool integration, force-enable image support, universal installation script, API path prefix fix
Queued message injection, chat keybinding fixes, YAML frontmatter 1MB limit for skill parsing
AST-based bash command analyzer, scoped lingering approval for tool calls
Windows support, subagent model selection for hybrid routing
Initial release
User types `late` in a project directory. The orchestrator reads the codebase, builds a plan, spawns an ephemeral subagent to implement each file change, then collects the results — all without polluting the main context window. Vix is a Go-native, open-source (AGPL-3.0) AI coding agent that slashes token costs by 40-50% using a stem agent architecture and Tree-sitter virtual filesystem. It rethinks the plan/execute loop — keeping LLM cache warm across Explore, Plan, and Execute phases — while shipping Programmable Workflows, Whiteboard Mode with voice AI, MCP server support, and a self-evolving agent that writes its own scheduled jobs and watchers.
Paca is a free, open-source, self-hosted Scrum board where AI agents work as equal teammates — assigned to sprints, picking up tasks, and collaborating on BDD specs alongside humans. Built as an alternative to Jira and Linear, it treats AI agents as first-class Scrum members.
Nanobot is an ultra-lightweight, open-source (MIT) personal AI agent that ships with WebUI, multi-channel chat (Telegram, Discord, WeChat, Slack, Feishu, email), MCP support, memory, model routing with fallbacks, cron automation, and a plugin skill system — all pip-installable in seconds. Built on a deliberately small and readable Python core, it lets you truly own your AI agent stack.