AI Models - From Expert Systems to Neural Allrounders

In the discourse around artificial intelligence (AI) models, closely related terms such as machine learning and neural networks come up. Even though they are often used synonymously, there are fundamental differences that are rooted in the techniques used in each case.

An increasing differentiation is taking place. This is reflected in a classification of particularly popular AI models and their functionalities in the various subareas.

What are AI Models?

AI models are computer programs and algorithms that are capable of independent decision-making with the help of artificial intelligence. They thus represent models of the capabilities of the human mind and are intended to automate actions that depend on them. Central to this is the use of an experience base to solve hitherto unknown problems, which is one of the most popular Definitions of intelligence corresponds. This principle can be adapted mechanically with the help of data analysis. There are countless implementation possibilities, as well as many other approaches to provide programs and algorithms with intelligent action.

Artificial intelligence is merely an umbrella term for all these techniques and opens up a complex system of different subsets into which each AI model must be individually classified. Particularly often the talk is about Machine Learning, which as the largest subset is indeed largely, but not completely congruent with artificial intelligence. Among them, again, is Deep Learning, which is mainly based on neural networks as an implementation technique. Major application areas of AI Models include Natural Language Processing (NLP), computer vision, and robotics, which in turn fall between the subfields of AI depending on the technique.

Thus, two aspects in particular are decisive for the classification:

  • The technique/methodology
  • The field of application

The difference between artificial intelligence and machine learning

Simple AI models already existed in the 1960s, but they remained without any real practical use for a long time. The problem was that they had to be programmed in a time-consuming way using rules and thus had to be prepared for individual situations. This changed in the 1980s with the emergence of machine learning. The big difference: It enables independent learning based on data and thus the recognition and implementation of action principles. 

This development has been greatly accelerated by the improvement of computing power and the emergence of ever larger amounts of usable data. Today, most models are based on machine learning, which explains the frequent synonymous use of the terms.

Nevertheless, it is still true: All Machine Learning Models are AI Models, but not all AI Models are Machine Learning Models.

Deep Learning - more potential through neural networks

Deep Learning is a very popular subfield of Machine Learning and the main reason why AI Models are currently attracting so much attention. It extends the application of data-based probabilistic principles to complexly constructed neural networks inspired by human brain function. They consist of multiple layers of artificial neurons, each of which mathematically transforms input values into an output.

The entire network thus forms output values based on all neurons and their weights optimized during training. Due to the depth of these networks and countless neuronal connections, significantly more complex relationships can be analyzed than with machine learning models without deep learning.

It applies: All Deep Learning Models are Machine Learning Models, but not all Machine Learning Models are Deep Learning Models.

simple display of a neural network
Simplified representation of a neural network

Simple models and their classification

When considering and classifying AI models, it becomes clear how along the corresponding category system, which has only been differentiated over time, the benefits and applicability of the technologies also grow in each case.

AI Models without Machine Learning

Due to the high expense in relation to little added value, these models have all but died out, although they are still used in specific cases. The most common representatives are:

  • Expert systems: These algorithms use a large set of rules and principles defined by experts. Their concatenation by a Inference Engine ultimately leads to decisions that make no use of probability. This also prevents the probability of error from being too high, making the approach suitable for avoiding fatal errors. Use cases: Medical diagnosis, IT troubleshooting, earthquake prediction.
  • Genetic Algorithms: As optimization techniques, they are not necessarily based on machine learning, but on principles of evolution such as selection, recombination and mutation. In this way, individual solutions to optimization problems are systematically modified to generate an optimum. Use cases: Route planning, vehicle design, portfolio optimization.
genetic algorithm
Genetic algorithms are another modeling approach inspired by nature.

ML Models without Deep Learning

Simple but effective models that make use of classical statistical principles reside in this area. Not infrequently, they serve as a quick baseline and provide initial insights from a database before neural networks are used. A distinction is made between unsupervised learning (using unstructured training data) and supervised learning (using structured training data).

  • Naive Bayes:: This is where the probabilistic Bayes theorem is used for classification problems. Based on properties that are attributed independence from each other, algorithms can calculate the most probable membership of an object to a class. The database required for machine learning here consists of correct assignments and the corresponding probability distribution. Use cases: Spam filter, Document classification, recommendation systems.
  • Decision tree: The hierarchical analysis of data in terms of a tree structure makes it possible to make well-founded predictions. In this way, classification but also regression problems can also be solved. Based on data properties, such algorithms use so-called decision nodes to apply the most appropriate rules. Use cases: Risk assessment in banking, development of marketing strategies, fraud detection.
  • Logistic regression:: This classic AI model uses the logit function to examine possible relationships between independent variables and a binary dependent variable. Thus, it is well suited to calculate the probability of an event occurring. Again, the utility for classifications is apparent, but compared to Naive Bayes, there is better interpretability due to the estimation of error probabilities. Use cases: Rainfall probability, social science studies, risk assessment.

Timeline of significant neural networks

The most powerful models to date are almost without exception based on Deep Learning. Due to their high performance and versatile fields of application, especially for speech and image processing, they are now the first choice for many applications.

Multi-layer perceptron

As one of the earliest neural networks, the perceptron consisted of only one input layer, one hidden layer and one output layer when it was introduced in the 1950s. Due to the lack of multi-layers and thus corresponding depth in information processing, there was no talk of Deep Learning yet.

This changed in the 1980s when it was extended in the form of a feedforward network with links of neurons from multiple layers. With the multi-layer perceptron, complex model training using input data patterns to optimize neural weights took place for the first time. A significant learning algorithm for this purpose is the Backpropagation:. To date, the perceptron remains a popular AI model for linear separable pattern recognition with availability in various open source frameworks such as Pytorch and Tensorflow.

Use cases: Handwriting recognition, stock analysis, image analysis.

Convolutional Neural Network (CNN)

The greatest innovation compared to the multi-layer perceptron is the multidimensional arrangement of the neurons. The activities of the neurons are determined by cross-correlation in the specially developed Convolutional Layer calculated. Another special feature is the correspondence of the neuronal weights within this layer. In addition, a so-called Pooling Layer is used for data reduction. This process is based on the lateral inhibition in the visual cortex of the brain and is intended to ensure that the most relevant information is taken into account or to prevent overfitting. This makes CNNs particularly well suited for accurate image recognition with a low error rate.

Use cases: Image recognition and classification, Optical Character Recognition (OCR).

Recurrent Neural Network (RNN)

The RNN specializes in sequences and ordered time data. In doing so, it breaks with the independence assumption regarding input and output that was previously the standard. Instead, the network considers the sequential order of previous elements in its calculations. In this sense, it models a kind of memory, which makes it particularly suitable for language processing. A further development of this AI model is the Long Short-Term Memorywhich enables the capture of longer contexts. Thus, new performance standards were set in the 2000s, to which combinations with CNNs also contributed significantly.

Use cases: speech processing, handwriting recognition, translation.

Transformer

In the field of automated speech processing, special neural networks consisting of encoders, decoders and so-called attention modules are used today. This enables the most powerful and flexible analysis of extensive contexts to date. Popular applications are ChatGPT or Bard, behind which the models GPT-4 and PaLM 2 stand - more details in the report on NLP Models. A new representative is LLama 2, which was trained exclusively with public data sets and about two trillion parameters. It allows a further increase in the detectable context length.

Harnessing the power of neural networks for document management

Many of the models and use cases presented deal with the processing of linguistic and visual elements. A particularly exciting use case is therefore the analysis of documents, which often combine precisely such content in the form of layouts, text or handwriting. The Konfuzio document software relies on equally versatile combinations of Deep Learning technologies and neural networks. This enables, for example:

Document classification

A document in the first place as invoice, Delivery note is a typical classification problem that can be solved by neural networks. For high accuracy, Konfuzio needs only few training data.

Text recognition

With techniques such as Natural Language Processing and Optical Character Recognition, automated recognition, analysis and extraction of text is a breeze for Konfuzio. This is also due to the learning ability of the Deep Learning Models behind it.

If you would like to learn more about the potential of AI for document management in your company, please feel free to contact us at Contact to us. Our experts look forward to the exchange.

Conclusion

Over time, artificial intelligence has differentiated into an increasingly complex system of various technologies. This can be seen just by looking at the countless AI models with individual functionalities. However, one major common denominator is the use of machine learning, which describes the automated extraction of knowledge from data. Such algorithms no longer rely on the complex programming of rule systems, but use statistical models such as logistic regression or Naive Bayes.

Currently, the greatest progress is being made in the field of deep learning, which is defined by the use of artificial neural networks. Their multidimensional architecture and the applicability of various optimizations, modifications, and intensive learning processes allow them to achieve unique performance for many tasks. This is evident, for example, in the processing of documents, which now requires hardly any human attention.


"
"
Tim Filzinger Avatar

Latest articles