Backpropagation: The key to training neural networks

To improve the accuracy of artificial neural networks, backpropagation is one of the most important supervised learning techniques. It is mathematically based on the comparison of desired output values with the actual output and feeds back the detected error from the output layer into the network. The subsequent optimization of neural weights represents the quintessence of machine learning. You can learn everything you need to know about this process here.

What is backpropagation?

Backpropagation is a mathematically based learning mechanism for training multilayer neural networks. It goes back to the Delta rule which describes the comparison of an observed output with a desired output (Delta = ai(desired) - ai(observed)). In the sense of a gradient method, the mean square error is usually calculated and used to optimize weights and biases when it is fed back into the network. In addition to input and output layers, hidden layers are also used in backpropagation. The basic requirement is that the desired target values are known at all times. The method is an important part of machine learning and contributes decisively to the fine-tuning of AI models.

The basic principle of the approach was already developed in the 1960s. At that time, however, it was still incomplete, inefficient and technically hardly applicable. In the 1970s, a more modern variant of it appeared, but it also found little practical use and was forgotten again for some time. In 1986, Rumelhart, Hinton, and Williams showed its applicability to neural networks, which was a breakthrough in cognitive psychology. Backpropagation does not model the learning mechanism of biological neurons, but leads to equally mathematically accurate results. Biochemically it is still not clear exactly how information about the target values gets back into the synaptic cleft of the previous neuron layer. However, it is considered certain that this is necessary for learning and finds a technical analogy in backpropagation, through which the accuracy of artificial networks can be increased.

synaptic cleft

Placement in the training process

Backpropagation describes only one - albeit very important - of the processes necessary for training artificial neural networks. Without the totality of such processes, it is not possible to develop a reliable AI model to develop. The procedure is predominantly mathematical in nature, but shall first find verbal explanation here: In order to train a neural network, it must be structured appropriately for the planned procedure. Basically, one can imagine a grouping of interconnected input and output nodes (neurons), which can be described as a nonlinear, recursive function.

The goal is to weight the individual neurons in such a way that the network provides the most accurate results possible. For this, an activation function, a hypothesis function and an error function are needed. To determine the changes in the weights at the end, an optimization function helps. Roughly, the network can be divided into input layer, hidden layers and output layer. The training process typically proceeds in the following steps:

  • Initialization at the input layer
  • Forward Propagation
  • Backpropagation:
  • Iteration

The calculation of the output values is the task of forward propagation and behaves more or less contrary to backpropagation. The outputs of individual neurons build on each other and form new input values for the following neurons. Finally, the output values of the network can be determined at the output layer and used for the error calculation. Thus, all requirements for backpropagation are met.

backpropagation network

How does backpropagation work?

Contrary to what the German meaning of the word suggests, backpropagation usually includes not only the backpropagation of errors, but also their calculation at the output layer. Precisely defined target values, which are compared with the results of forward propagation, are decisive. The error function used typically includes the mean squared error, but can also use cross entropy or the mean absolute percentage error.

The resulting values correspond to the inaccuracy of the entire network, since the output used was calculated on the basis of all the neurons it contains. Now it is necessary to minimize the detected error, to feed it back into the network and to make a change in the weights and threshold values (bias) based on it via the optimization function. In this way, the network already delivers more accurate results in the following iteration. The process can be repeated until the desired accuracy is achieved. The main steps of backpropagation are thus:

  • Error calculation
  • Error minimization
  • Weight adjustment
  • Modeling of the prediction accuracy

A detailed mathematical explanation of the procedure can be found here:


By loading the video, you accept YouTube's privacy policy.
Learn more

Load video

Two types of backpropagation

The details of the learning procedure may vary depending on the nature of the network and tasks it is intended to perform. A typical categorization is:

1. static backpropagation

This variant is used when the model provides a static output to a static input. A common area of application is AI-based Optical Character Recognition (OCR). When training a corresponding network, the input would consist of optical, e.g. handwritten characters, and the target values would be linked to matching text-based characters. Through error feedback, the network learns and thus continuously increases the accuracy of text recognition.

2. recurrent backpropagation

Here, the activations are transmitted through the network until they reach a fixed value. They are therefore not static from the start as in the previously described method. Another difference is that no direct assignment is possible during initialization at the input layer.

Practical application examples


This AI model, based on the GPT architecture, should be familiar to everyone by now. It was developed to respond to input with the most human-like answers possible and has undergone pre-training with a large amount of text. For special tasks, ChatGPT can be fine-tuned, with backpropagation playing a crucial role. According to the procedure described above, the error function is minimized and used to optimize the weights of the neural network. Thus, the tool delivers increasingly accurate results.

Image Recognition

This is a subspecies of Computer Visionwhich is used not only to recognize but also to interpret image information for further decision-making. For this purpose, mainly classical neural networks are used, which can be trained with the help of backpropagation. This approach offers a particular added value here, since a very large number of iterations can easily be performed, which is absolutely necessary for fine-tuning for accurate image interpretation.


This pre-trained language model is used to analyze complexly structured documents. It combines both text and layout information and is thus a very helpful tool for the Document Understanding of invoices, forms and receipts. Backpropagation is used to fine-tune the model for a specific document type. In this way, it can ultimately be used very specifically according to individual needs - which sums up the main goal of fine-tuning using backpropagation.

Tim Filzinger Avatar

Latest articles