News
3 minute read

These new IBM agents will give developers new ways to solve problems — and clear their backlog

IBM SWE-Agent 1.0 from IBM Research is the first set of software engineering (SWE) agents of its kind, only powered by open LLMs, that can autonomously resolve GitHub issues efficiently.

IBM SWE-Agent 1.0 from IBM Research is the first set of software engineering (SWE) agents of its kind, only powered by open LLMs, that can autonomously resolve GitHub issues efficiently.

For most software developers, every day starts with where the last one left off. Trawling through the backlog of issues on GitHub you didn’t deal with the day before, you're triaging which ones you can fix quickly, which will take more time, and which ones you really don’t know what to do with yet. You might have 30 issues in your backlog and know you only have time to tackle 10. It can feel like a Sisyphean task and it’s easy to get burned out if not managed properly.

But what if there were tools that could make finding bugs, suggesting fixes, and testing those ideas as easy as submitting an issue on GitHub?

At TechXchange today, IBM showed off a new set of AI agents designed specifically to make the lives of developers easier. The goal is that these agents will help cut down the amount of time that developers have to spend hunting for answers to bugs in their backlogs, freeing up more time to work on new projects. These are the first software engineering (SWE) agents of their kind — only powered by open LLMs — that can autonomously resolve GitHub issues efficiently.

IBM SWE-Agent 1.0.png
The IBM SWE-Agent 1.0 system architecture. Given a GitHub issue, the Agent first ”localizes” to where the bugs are and then edits those lines of code to resolve them.

Localization is the first big task tackled with an agent. If you can’t find where a bug is in your code, you can’t fix it. Localization is the task of finding the files and lines of code in an organization’s codebase that are causing a given error.

When an error is spotted by a quality assurance (QA) engineer, they’ll file a bug report, which would go into the developer’s backlog and add to the pile of bugs to sift through. Finding the offending line — and ensuring altering it won’t affect anything else in the codebase — is a time-consuming process.

But with the SWE localization agent, a developer could open a bug report they’ve received on GitHub, tag it with “ibm-swe-agent-1.0” and the agent will quickly work in the background to find the troublesome code. Once it’s found the location, it’ll suggest a fix that the developer could implement to resolve the issue. That developer could then review the proposed fix and decide if it’s the best way to solve the problem, potentially even using other agents to figure it out.

The IBM SWE Agent in action

The localization tool is just one of a few new AI agents IBM Research has developed that aim to take a chunk out of the workload developers have on their plate. There’s also one for editing lines of code based on developer requests, which relies on IBM's Granite LLM on watsonx, via PDL. Another agent can be used for developing and executing tests to ensure that code will run as intended. In each case, they can be invoked right where developers would want them to be, such as in GitHub.

On average, the SWE agents can localize and fix problems within five minutes, and in testing, they have managed a 23.7% success rate on the SWE-bench tests. These measure how efficiently AI agents can solve real-world problems found on GitHub. That score places the IBM SWE agent high up the SWE-bench leaderboard, well above many other agents relying on massive frontier models, like GPT-4o and Claude 3. 

IBM SWE-Bench Lite.png
These results compare IBM’s SWE-Agent 1.0 (dark blue, right) with only open-source LLMs being at par with some of the top performers.
Localization Performance.png
Localization results above show that their agent with open-source LLMs are almost tied in performance with an agent with frontier LLMs.

In each case, these agents observe, think, and act. They differ from singular LLMs or foundation models, as they can call upon different models and stores of information to answer questions in ways that are most efficient. And they can plan out the steps required to carry out that task — something an LLM cannot do on its own. An agent can do complex, complete tasks without additional input from a user.

It made sense for IBM to build agentic tools like these, argues Ruchir Puri, chief scientist at IBM Research, not just for its own developers, but for all the enterprise developers IBM strives to assist. There are other competitive SWE agent tools looking to aid developers at work, but they primarily are relying on massive, proprietary frontier models that cost a great deal at inference time. “Our goal was to build IBM SWE-Agent for enterprises who want a cost efficient SWE agent to run wherever their code resides — even behind your firewall — while still being performant,” Puri said.

Date

Share