Utilizing APIs by Autonomous AI with LLMs

Utilizing APIs and Planning Autonomous AI powered by Large Language Models

Overview

Lately, there has been increasing interest in utilizing LLMs for tasks that require reasoning and external tools. A classic example is executing a plan that involves API sequencing given a goal in natural language. One advantage of using LLMs is their ability to work with natural language without the need to formalize the input, opening up possibilities for diverse users and use cases. Although significant progress has been made with different prompting techniques such as Chain of Thought, ReAct, ReWOO, etc., LLMs still often struggle with classic planning tasks.

There is a big gap between the stage we are at currently, in which the LLM is given a very detailed set of instructions and told how to use the different APIs, to the “holy grail” stage in which we would only give it a description of the end goal and expect it to do the whole plan by itself. This highlights the need for additional research and approaches to enhance LLMs’ planning capabilities. We are working on two activities that are advancing us in that desired direction:

Planning-Aware Techniques for LLMs (collaborating labs: Israel, Yorktown Heights)

In a recent paper, we demonstrated through experimentation that LLMs lack the necessary skills required for planning. For example, LLMs struggle to handle the complex nature of long action-observation trajectories, where their limited compositional skills hinder their ability to discern relevant information. Moreover, the incapacity of LLMs to effectively deal with distractors further exacerbates their shortcomings in planning, making them less adept at navigating intricate tasks that involve nuanced decision-making processes. One of the approaches to mitigate the above is by using a modular approach, i.e., standalone components that function independently and interact with the LLM sequencer before, during, or after its operation.

In our paper, we advocate for the potential of a hybrid approach that combines LLMs with classical planning methodology and introduce a novel hybrid method (that we call SimPlan) that outperforms existing LLM-based planners. Another promising component is called API Search. This is a shortlisting module that plays a crucial role in optimizing the search space of the sequencer. The shortlisting module functions by narrowing down the search space, allowing the sequencer to focus on a smaller set of more relevant candidates. We lately had promising results using a trained retriever model for the Previous Best API problem: Given a sequence of APIs, and a large catalogue, the objective was to find in the catalogue all the Previous Best APIs (PBA) suited in each location.

Open API Spec Builder (collaborating labs: Israel, Yorktown Heights, India)

At the heart of working with external tools lays the need to use OpenAPI specifications. These specs describe the structure and functionality of APIs in a standard way and are typically written in JSON or YAML. To execute the plan, the LLM will first and foremost require the API specifications of the tools participating in the plan. However, these specs are not always available. Often enough, the Web service publishes documentation written in natural language for a human developer to read and create the spec accordingly. This manual task is labor-intensive and can take days to complete. To address that issue, we recently released the OpenAPI Spec Builder as a beta feature in IBM watsonx Orchestrate. OAS Builder utilizes the IBM Granite model to automate the generation and enrichment of OpenAPI specifications from documentations. Recently, this functionality was estimated to lead to nearly 90% gains in productivity, representing an important step towards the full automation of the planning process.