# SWE-agent | Artificialus

> For the complete content index, see [llms.txt](https://artificialus.com/llms.txt). Markdown versions of all pages are available by appending `.md` to any URL.

- Home
- /
- Agents
- /
- SWE-agent

SW

# SWE-agent

The original ACI-based autonomous coding agent — now superseded by mini-SWE-agent.

Princeton NLP / Stanford / open-source community

Open source

Since 2024

Visit Website

Repository

Docs

Download

Share

X

Facebook

Reddit

Telegram

Bluesky

Email

SWE-agent is an open-source autonomous AI software engineering agent from Princeton NLP and Stanford that introduced the Agent-Computer Interface (ACI) concept. It enables language models to autonomously fix GitHub issues, solve cybersecurity CTF challenges, and perform custom coding tasks through a structured command interface, all within isolated Docker sandboxes. Now superseded by mini-SWE-agent for most practical use cases.

##
−

Cons
- Research-oriented — less polished UX than commercial tools like Devin, Cursor, or Claude Code
- No GUI or IDE integration — CLI only, Docker required for sandboxed execution
- Requires manual setup of LLM API keys, Docker, and Python environment
- Superseded by mini-SWE-agent for most practical use cases — the official recommendation is to use the smaller, simpler successor

##

Pricing

### Free (OSS)

Free

Self-hosted open-source agent. MIT license. Requires own API keys.

## Introduction

SWE-agent is the research project that defined how autonomous AI agents interact with codebases. Published by Princeton NLP and Stanford researchers in April 2024 and accepted at NeurIPS 2024, it introduced the Agent-Computer Interface (ACI) : a purpose-built set of commands (open file, scroll, search, edit, run shell commands, run tests) that gives language models a structured way to navigate and modify a software repository.

>
Rather than dumping raw file contents into a context window, ACI gives the model a set of verbs to explore and modify code step by step, dramatically improving success rates on software engineering tasks.

SWE-agent was one of the original reference implementations for the SWE-bench
benchmark and achieved state-of-the-art results among open-source systems. The project has since spawned an ecosystem: mini-SWE-agent (the official successor), SWE-ReX (parallel cloud execution), SWE-smith (training trajectory generation), and EnIGMA (offensive cybersecurity). As of 2025, the project recommends mini-SWE-agent, which matches SWE-agent's performance in a dramatically simpler package. SWE-agent remains the canonical full-featured reference implementation — the foundation the ecosystem was built on.

##
Key Features

Agent-Computer Interface (ACI). A set of structured file and shell commands built for LLM interaction. Instead of raw file dumps, the model gets verbs like `open`, `scroll`, `edit`, `search`, and `submit` that mirror how a human developer works. Numerous subsequent autonomous coding agents have adopted and adapted the ACI design.

SWE-bench SOTA Performance. SWE-agent has consistently been the top-performing open-source system on SWE-bench. Version 1.0 (February 2025) achieved state-of-the-art results on SWE-bench verified, lite, and full using Claude 3.7 Sonnet. The SWE-agent-LM-32b model, trained via the SWE-smith project, holds open-weights SOTA on SWE-bench verified.

Docker Sandboxing. Every task runs in an isolated Docker container so each run stays safe and reproducible. The sandbox prevents the agent from affecting the host system and gives a clean, repeatable environment for every run.

Configurable LLM Backend. Supports any model via litellm, including Claude, GPT-4o, Gemini, DeepSeek, and local open-weight models. Configuration is governed by a single YAML file, with the ability to override any setting from the command line.

SWE-ReX
Parallel Cloud Execution. Massively parallel code execution using modal, AWS, or any other cloud provider. SWE-ReX decouples agent reasoning from code execution for running hundreds or thousands of SWE-agent tasks concurrently in the cloud.

Tool Bundles. Flexible, configurable tool definitions for composing custom sets of commands for different tasks. Bundles can be mixed, matched, and overridden without modifying the core agent code.

Interactive Agent Tools. Tools like `gdb` for interactive debugging sessions during agent runs, so the model can step through code execution and inspect program state.

Summarizer. Handles long model outputs by summarizing intermediate results before feeding them back into the context window, which prevents context overflow on lengthy debugging sessions.

Trajectory Inspector. A command-line tool for browsing, filtering, and analyzing hundreds of agent trajectories with ease. Essential for research: inspect every model action, command output, and decision point across multiple runs.

EnIGMA
Cybersecurity Mode. A mode for offensive cybersecurity capture-the-flag (CTF) challenges. Achieves a 3.3x improvement over previous agents on the NYU CTF benchmark dataset, with dedicated tools for exploit development and vulnerability analysis.

##
Ecosystem

The SWE-agent project now encompasses several projects that extend the original vision:

mini-SWE-agent
. The official successor project. Achieves 65% on SWE-bench verified in approximately 100 lines of Python. Matches SWE-agent's performance while being dramatically simpler. The SWE-agent README now directs users to mini-SWE-agent for most practical use cases.

SWE-smith
. A training trajectory generator that produces tens of thousands of high-quality trajectories for fine-tuning language models on software engineering tasks. The resulting SWE-agent-LM-32b model achieves open-weights SOTA on SWE-bench verified.

SWE-ReX
. A massively parallel code execution framework that runs SWE-agent tasks in the cloud. Supports modal, AWS, and custom execution backends for large-scale benchmarking and evaluation.

EnIGMA
(Enhanced Interactive Generative Model Agent). An offensive cybersecurity variant that achieves state-of-the-art results on multiple CTF benchmarks. Introduced interactive agent tools and the summarizer, which were subsequently merged into the main SWE-agent codebase.

SWE-bench
. While not part of SWE-agent itself, the SWE-bench benchmark is deeply intertwined with the project. SWE-agent served as the original reference implementation and continues to be a primary evaluation target.

##
Benchmark Performance

SWE-agent has consistently demonstrated state-of-the-art performance on the SWE-bench
family of benchmarks:
- SWE-bench Verified. SOTA among open-source agents (v1.0 with Claude 3.7 Sonnet).
- SWE-bench Full. SOTA on the complete set of 2,000+ GitHub issues (v1.0.1, February 2025).
- SWE-bench Lite. SOTA among open-source solutions on the lighter evaluation subset.
- SWE-agent-LM-32b. Open-weights SOTA on SWE-bench verified, trained using SWE-smith-generated trajectories.
- NYU CTF Benchmark. EnIGMA mode achieves a 3.3x improvement over previous state-of-the-art agents on offensive cybersecurity challenges.

All results are academically validated and published in peer-reviewed venues including NeurIPS 2024.

##
Who Is It For

SWE-agent is for researchers, academics, and developers who want full visibility into how an autonomous coding agent operates. It works well for studying agent-computer interaction, benchmarking new models against established baselines, and building custom agent systems on top of well-documented principles.

The project is less suited for day-to-day production use where polished UX, IDE integration, or minimal setup overhead is required. For those use cases, the official recommendation is mini-SWE-agent, or commercial alternatives like Devin, Cursor, or Claude Code.

Specific use cases include:
- Academic research into autonomous software engineering and agent-computer interaction.
- Benchmarking new language models on SWE-bench using an established, well-documented evaluation pipeline.
- Building custom agent systems by extending or modifying the ACI tool set.
- Offensive cybersecurity research using the EnIGMA
mode.
- Teaching and education: the trajectory inspector and full action logging make SWE-agent an excellent teaching tool for understanding how LLM-based agents work.

##
Further Reading
- Official Documentation
- SWE-agent Repository on GitHub
- SWE-agent Paper (NeurIPS 2024)
- mini-SWE-agent
- SWE-bench Benchmark
- SWE-ReX Parallel Execution Framework

##
Getting Started

SWE-agent requires Python 3.11+, Docker, and an LLM API key. Installation is via pip:

```
`pip install sweagent`
```

The recommended workflow for fixing a GitHub issue:

```
`sweagent run --agent.model.name=claude-sonnet-4-20250514 --env.repo.github_url=https://github.com/example/repo --problem_statement.github_url=https://github.com/example/repo/issues/42`
```

SWE-agent will clone the repository into a Docker container, analyze the issue, explore the codebase using its ACI tools, develop and test a fix, and output a patch file. A full trajectory log is saved for inspection.

For batch evaluation on SWE-bench:

```
`sweagent run-batch \
--agent.model.name claude-sonnet-4-20250514 \
--instances.type swe_bench \
--instances.subset verified`
```

For cloud-based parallel execution, SWE-ReX
can be configured to run tasks on modal or AWS, scaling from a single issue to thousands of evaluations.

Detailed documentation, configuration guides, and migration notes from earlier versions are available at the SWE-agent documentation
.

## Version History

v1.1.0
May 22, 2025
10s of thousands of training trajectories via SWE-smith, multilingual/multimodal SWE-bench support

v1.0.1
Feb 28, 2025
SOTA on SWE-bench Full with Claude 3.7 Sonnet

v1.0.0
Feb 13, 2025
SWE-agent 1.0: new CLI, SWE-ReX, tool bundles, trajectory inspector

Best for Researchers studying agent-computer interaction, academics benchmarking on SWE-bench, and developers building custom autonomous coding agents on top of proven ACI principles.

Capability Agent-Computer Interface (ACI), SWE-bench SOTA open-source, Docker sandboxing, configurable LLM backend via litellm, SWE-ReX parallel cloud execution, EnIGMA cybersecurity mode, SWE-smith training trajectories, mini-SWE-agent successor, trajectory inspector, interactive agent tools

Runs on CLI · Docker · macOS · Linux · Windows (WSL)

Signature Snippet

Copy

```
`sweagent run --agent.model.name=claude-sonnet-4-20250514 --env.repo.github_url=https://github.com/example/repo --problem_statement.github_url=https://github.com/example/repo/issues/42`
```

## More in this Space

SO

### Sourcery

Closed source

AI code review platform for the AI era. Automated code reviews, security scanning, and team analytics across GitHub, GitLab, VS Code, and JetBrains. Used by 300,000+ developers.

View profile

WT

### What The Diff

Closed source

AI-powered PR description generator and code review assistant. Automatically writes pull request descriptions, sends stakeholder notifications, creates changelogs, and provides inline code refactoring.

View profile

BA

### Blackbox AI

Closed source

Multi-agent AI coding platform with 12+ agents and 24+ models, featuring Chairman LLM for parallel multi-agent evaluation and end-to-end encrypted inference. Ships across six surfaces: CLI, IDE, Cloud, API, Mobile, and Builder.

View profile