Meta AI Toolformer uses applications independently

Language models such as ChatGPT are currently the ultimate in natural language processing. As pre-trained AI models based on neural networks, they are particularly good at processing and generating text. Despite the versatile application possibilities, however, the text bots quickly reach their limits in terms of arithmetic and fact-checking. Toolformer is designed to address this problem by allowing the AI to independently use external tools via application programming interfaces (API) - a promising approach.

This article is intended to make current research and resulting development opportunities easier to understand.


  • Toolformer is a ChatGPT-like language model from Meta.
  • It was trained through a self-supervised learning process to sample, execute, and filter API calls.
  • This enables the independent use of tools such as a calendar or calculator.
  • Thus Toolformer overcomes typical limitations of conventional models.

What are the limits of Language Models?

While the general public is still marveling at new possibilities of technological progress, developers like to look directly at everything that doesn't work yet. In the case of language models, this contrast is particularly stark. Only recently, in the form of tools like ChatGPT eclipse most previous AI models and prove to be the most helpful tool for automated speech processing at present due to their extensive pre-training. The versatile applications can easily seem like omnipotence, but they are not. The following areas remain particularly problematic and motivate innovations such as Toolformer:

Correctness of content

The main source of content generated by language models is the Internet. The data quality there can vary greatly, but this is hardly reflected by common models. In addition, the timeliness is significantly dependent on the time of the pre-training. In addition, there are algorithmic limitations that make fact-checking, for example according to the two-source principle, more difficult. Misinformation is therefore one of the most common hallucinations of AI.

GPT fact-checking
In the meantime, ChatGPT does point out insufficient up-to-dateness, but the desired answer still fails to appear.


In fact, typical Language Models are still relatively bad at math, so even basic arithmetic and corresponding logic can be difficult. One of the problems behind this is the translation of math into code, for which there are very few training examples compared to language translation.

data results
A false result presented with conviction

What can Toolformer do?

To achieve better results in these areas, the model addresses external tools via API call. Previous approaches were either dependent on a large number of human annotations or could only be used for very specific tasks. In contrast, the stand-alone execution of API Calls enables the intelligent integration of a whole range of applications that focus particularly on previous weaknesses of Language Models:

  • Question and answer systemFor this purpose, the Language Model "Atlas" is used, which has been fine-tuned on the basis of natural questions.
  • Wikipedia search: This search engine responds to search terms with a quick Wikipedia search and independently extracts small fragments of text. Generated results can be more specific than with the question-answer system. For example, finding the current chancellor of Germany should not be a problem.
  • Calendar: A quick glance at a calendar app helps to place facts in their correct temporal context and draw conclusions about their timeliness.
  • CalculatorWith a simple tool for the four basic arithmetic operations and an accuracy of two decimal places, Toolformer can correctly solve the above calculation example.
  • Translation tool: To generate an English translation from any language, another external language model is integrated.
used tools
The range of usable tools could grow significantly in the coming years.

Overcoming borders via API

The aspect of this idea that has been particularly difficult to implement so far is the AI-based API Managementwhich can sometimes become a complicated matter even for human developers. It is not much easier to format API calls into a text sequence that fits naturally into the conversation with the used GPT-J model into the system. Its complexity, based on more than 6 billion parameters, further complicates the task. Toolformer overcomes these hurdles and thus typical limitations of most conventional language models in three basic steps:

  1. Sample API Calls

    For each possible API, the developers wrote a call that lets Toolformer assemble relevant API calls. Using probabilistic principles, the model then identifies some calls whose probabilities of fitting into the sequence are above a certain threshold. With just a few examples, Toolformer learns how to optimize this process.

  2. Execute API Calls

    Subsequently, all sampled API calls are executed to obtain the corresponding results or responses from the tools. These must be available in short sequences of text for the further procedure. The details of the process depend on the individual API, which can address a neural network or a Python script, for example.

  3. Filter API Calls

    To filter the calls, Toolformer calculates the weighted loss functions of the generated API tokens based on cross entropy. By comparing the results either considering or independent of the API calls, useful cases are identified for predicting further tokens. Again, an appropriate threshold is set to decide which calls are retained.

Fine tuning of the model

Toolformer uses the filtered API calls to merge them into a new dataset with the original input. This lays the foundation for subsequent fine-tuning, which is performed using typical language modeling. In terms of content, the new dataset should thus match the original dataset - apart from the added API calls.

At best, the user is not aware of this process in the backend and receives a more or less natural reaction from Toolformer. The more inputs the language model processes in the further course, the more precisely it can use the extended dataset to predict future API tokens and correct positions of corresponding calls. Put simply, Toolformer learns independently when to use which application and how best to provide the optimal response.

Conclusion and prospects

By independently driving external applications via API calls, Toolformer can overcome typical limitations of arithmetic and content correctness. In concrete terms, tools such as a search engine, a calculator or a calendar help here. Their answers are naturally integrated into the Conversation text integrated. To make this possible, Toolformer uses a self-supervised learning procedure consisting of sampling, execution and filtering of API calls. Datasets formed from this form the basis for fine-tuning the model.

Toolformer thus provides an experimental outlook on what a versatile language model like ChatGPT could do in the future. Previous attempts have tended to focus on implementing tools directly in a chat bot, but according to Meta, this has not yet led to the desired results. External use via API is therefore a promising alternative. Here, too, work is still waiting for the developers. For example, Toolformer is not yet capable of executing chaining or interactive use of applications. Whether the existing capabilities are already enough for a fundamental breakthrough therefore remains to be seen.

Tim Filzinger Avatar

Latest articles