Free (OSS)
Self-hosted open-source agent. MIT license. Requires own API keys.
The original ACI-based autonomous coding agent — now superseded by mini-SWE-agent.
SWE-agent is an open-source autonomous AI software engineering agent from Princeton NLP and Stanford that introduced the Agent-Computer Interface (ACI) concept. It enables language models to autonomously fix GitHub issues, solve cybersecurity CTF challenges, and perform custom coding tasks through a structured command interface, all within isolated Docker sandboxes. Now superseded by mini-SWE-agent for most practical use cases.
Self-hosted open-source agent. MIT license. Requires own API keys.
SWE-agent is the research project that defined how autonomous AI agents interact with codebases. Published by Princeton NLP and Stanford researchers in April 2024 and accepted at NeurIPS 2024, it introduced the Agent-Computer Interface (ACI) : a purpose-built set of commands (open file, scroll, search, edit, run shell commands, run tests) that gives language models a structured way to navigate and modify a software repository.
Rather than dumping raw file contents into a context window, ACI gives the model a set of verbs to explore and modify code step by step, dramatically improving success rates on software engineering tasks.
SWE-agent was one of the original reference implementations for the SWE-bench benchmark and achieved state-of-the-art results among open-source systems. The project has since spawned an ecosystem: mini-SWE-agent (the official successor), SWE-ReX (parallel cloud execution), SWE-smith (training trajectory generation), and EnIGMA (offensive cybersecurity). As of 2025, the project recommends mini-SWE-agent, which matches SWE-agent's performance in a dramatically simpler package. SWE-agent remains the canonical full-featured reference implementation — the foundation the ecosystem was built on.
Agent-Computer Interface (ACI). A set of structured file and shell commands built for LLM interaction. Instead of raw file dumps, the model gets verbs like open, scroll, edit, search, and submit that mirror how a human developer works. Numerous subsequent autonomous coding agents have adopted and adapted the ACI design.
SWE-bench SOTA Performance. SWE-agent has consistently been the top-performing open-source system on SWE-bench. Version 1.0 (February 2025) achieved state-of-the-art results on SWE-bench verified, lite, and full using Claude 3.7 Sonnet. The SWE-agent-LM-32b model, trained via the SWE-smith project, holds open-weights SOTA on SWE-bench verified.
Docker Sandboxing. Every task runs in an isolated Docker container so each run stays safe and reproducible. The sandbox prevents the agent from affecting the host system and gives a clean, repeatable environment for every run.
Configurable LLM Backend. Supports any model via litellm, including Claude, GPT-4o, Gemini, DeepSeek, and local open-weight models. Configuration is governed by a single YAML file, with the ability to override any setting from the command line.
Tool Bundles. Flexible, configurable tool definitions for composing custom sets of commands for different tasks. Bundles can be mixed, matched, and overridden without modifying the core agent code.
Interactive Agent Tools. Tools like gdb for interactive debugging sessions during agent runs, so the model can step through code execution and inspect program state.
Summarizer. Handles long model outputs by summarizing intermediate results before feeding them back into the context window, which prevents context overflow on lengthy debugging sessions.
Trajectory Inspector. A command-line tool for browsing, filtering, and analyzing hundreds of agent trajectories with ease. Essential for research: inspect every model action, command output, and decision point across multiple runs.
The SWE-agent project now encompasses several projects that extend the original vision:
SWE-agent has consistently demonstrated state-of-the-art performance on the SWE-bench family of benchmarks:
All results are academically validated and published in peer-reviewed venues including NeurIPS 2024.
SWE-agent is for researchers, academics, and developers who want full visibility into how an autonomous coding agent operates. It works well for studying agent-computer interaction, benchmarking new models against established baselines, and building custom agent systems on top of well-documented principles.
The project is less suited for day-to-day production use where polished UX, IDE integration, or minimal setup overhead is required. For those use cases, the official recommendation is mini-SWE-agent, or commercial alternatives like Devin, Cursor, or Claude Code.
Specific use cases include:
SWE-agent requires Python 3.11+, Docker, and an LLM API key. Installation is via pip:
pip install sweagent The recommended workflow for fixing a GitHub issue:
sweagent run --agent.model.name=claude-sonnet-4-20250514 --env.repo.github_url=https://github.com/example/repo --problem_statement.github_url=https://github.com/example/repo/issues/42 SWE-agent will clone the repository into a Docker container, analyze the issue, explore the codebase using its ACI tools, develop and test a fix, and output a patch file. A full trajectory log is saved for inspection.
For batch evaluation on SWE-bench:
sweagent run-batch \
--agent.model.name claude-sonnet-4-20250514 \
--instances.type swe_bench \
--instances.subset verified For cloud-based parallel execution, SWE-ReX can be configured to run tasks on modal or AWS, scaling from a single issue to thousands of evaluations.
Detailed documentation, configuration guides, and migration notes from earlier versions are available at the SWE-agent documentation .
10s of thousands of training trajectories via SWE-smith, multilingual/multimodal SWE-bench support
SOTA on SWE-bench Full with Claude 3.7 Sonnet
SWE-agent 1.0: new CLI, SWE-ReX, tool bundles, trajectory inspector
sweagent run --agent.model.name=claude-sonnet-4-20250514 --env.repo.github_url=https://github.com/example/repo --problem_statement.github_url=https://github.com/example/repo/issues/42 AI code review platform for the AI era. Automated code reviews, security scanning, and team analytics across GitHub, GitLab, VS Code, and JetBrains. Used by 300,000+ developers.
AI-powered PR description generator and code review assistant. Automatically writes pull request descriptions, sends stakeholder notifications, creates changelogs, and provides inline code refactoring.
Multi-agent AI coding platform with 12+ agents and 24+ models, featuring Chairman LLM for parallel multi-agent evaluation and end-to-end encrypted inference. Ships across six surfaces: CLI, IDE, Cloud, API, Mobile, and Builder.