timeline and evolution of nlp language models

NLP Models - Rapid development of artificial language geniuses

Tim Filzinger

The fact that artificial intelligence has been shaking up both IT and the media for some years now is mainly due to developments in one particular subarea of the technology: Natural Language Processing (NLP) is revolutionizing the way humans and machines communicate. AI-based generation of coherent texts is just one of the unimagined possibilities that even experts are just beginning to grasp. A timeline of the most significant NLP models provides information about the past, present and future of automated language processing.

What are NLP Models?

NLP means natural language processing and describes AI techniques that deal with natural language processing. Common applications include analysis, interpretation, summarization, translation, and generation of text. However, algorithmic processing of spoken input is also becoming an increasingly important discipline of this subfield of artificial intelligence. Since human speech is riddled with irregularities, ambiguities as well as humorous and emotional influences, complex AI models are necessary for NLP - so-called NLP Models, which, however, are still rapidly approaching their limits. Boundaries bump

NLP models form the core of corresponding AI systems or applications. They are usually based on artificial neural networks. These are groupings of connected input and output nodes (neurons) that can be described mathematically as a recursive function. By transmitting input signals through the network, pre-training of modern NLP models is possible using large amounts of text. The later readjustment of the neurons for specific tasks is called fine-tuning. In this process, smaller datasets are used, which correspond to the subtleties of the new task type. A whole series of these can now be solved.

simple neural network model for nlp
Simplified representation of a neural network

How does Natural Language Processing work?

As diverse as human language is, so are the approaches to making it understandable for algorithms. Here are a few particularly relevant strategies:

  1. Sentence segmentation

    By splitting sentences into smaller segments, it is easier to categorize parts of sentences. This makes them mathematically representable and algorithms can better capture the respective context.

  2. Syntax analysis

    A look at the syntactic functions and relationships of the words (e.g. subject, object, predicate) allows a conclusion about their meaning. Basis are correctly assigned relations in the training data. A corresponding common NLP model is Word2vec.

  3. Semantic analysis:

    Similarly, entities and semantic relations such as synonym or antonym can give clues to the exact word meaning.

  4. Sentiment Analysis:

    Categorizations such as "negative", "positive" or "neutral" enable meaningful decisions or actions to be taken. This is useful, for example, when analyzing customer feedback.

Concrete techniques that are used are e.g. vectorization, parsing, classifiers or Word Sense Disambiguation. Often, individual approaches alone do not allow a full understanding of the text. In many cases they are therefore combined - either by several or by particularly versatile NLP models.

Timeline of the most important NLP Models

Natural Language Processing actually has its origins back in the 1950s: In an experiment at Georgetown University in collaboration with IBM, researchers succeeded in machine-translating over 60 Russian sentences into English. After that, the new technology fell short of expectations for a long time. In the 1990s, the combination with machine learning algorithms such as Decision Trees to increased applicability - but only for individual tasks. NLP models have only been making a name for themselves for a few years. Why is that?

The birth of Transformers

Transformers are a particular form of NLP Models whose special architecture was first introduced by Google in 2017. It is based on a Embedding-layer for vectorizing input sequences as well as encoders and decoders connected behind them. So-called attention modules also play a particularly important role (cf. title of the paper "Attention Is All You Need"). These allow the calculation of correlations between entered units, which facilitates the determination of word relationships and contexts.

The attentional mechanism is modeled on unconscious processes of human speech perception by which words are given special weight for sentence meaning regardless of their order. This makes Transformers superior to purely sequential models like LSTM or Seq2seq superior. The situation is similar in comparison to previous embedding models such as Word2vec. The special feature of transformers is that they can take over the tasks of various individual NLP models due to their versatile architecture and even surpass them in each case.

Architecture of a Transformer Language Model
Transformer architecture. Source: Attention Is All You Need


The next breakthrough in Natural Language Processing in 2018 was not yet a representative of the transformers, but influenced the development of a corresponding candidate. Embeddings from Language Models (ELMo) uses word representations and takes into account complex factors such as syntax and semantics as well as various context variations. The Language Model determines these from surrounding words and is thus particularly sensitive to linguistic subtleties. Also due to its extensive pre-training with over one billion words, ELMo quickly became state of the art for many NLP tasks.


  • Convolutional Neural Network (CNN)
  • Bidirectional Language Model consisting of two layers
  • LSTM modules connected in series
  • 93.6 million parameters

Skills: Translations, summaries of text, answering questions, mood analysis.


In 2018, the first generative pre-trained transformer of OpenAI attracted a great deal of attention in expert circles. As a direct reaction to the Transformer architecture presented by Google, the concept was extended to include generative pre-training. In contrast to previous practice, the NLP Model was subjected to a unsupervised learning process. The training set consisted of the text of over 11000 books. This was intended to enable GPT to understand longer contexts and to facilitate the production of its own text passages. This capability is by far the greatest innovation compared to earlier Language Models. GPT-1 was the starting signal for an unprecedented race for ever better NLP performance by Transformer.


  • Transformer Decoder Model
  • 117 million parameters
  • 12 Layer 
  • Omission of the encoder part proposed by Google

Skills: Generating and completing coherent text, translations, answering questions.


Of course, Google also had to come up with its own Transformer Model in the same year: BERT (Bidirectional Encoder Representations from Transformers) applies a bidirectional training approach, so that a deeper context understanding is possible than with the one-sided run-through of sequences. Here, the influence of ELMo on the transformer developments is noticeable. However, in order for this method to be applicable to other NLP models at all, the researchers also introduced a new technique called Masked Language Modeling (MLM) before. BERT thus founded a whole family of particularly powerful language models, which have even been implemented in Google Search. 


  • Transformer Encoder Model
  • additional classification layer (for MLM)
  • BERTbase: 12 layers, 110 million parameters
  • BERTlarge: 24 layers, 340 million parameters

Skills: Capture long contexts, summarize and generate text, word predictions.


In 2023, OpenAI looks back on a whole series of GPT models, which were also made available to the public with ChatGPT. They surpassed their predecessors in complexity, the amount of text used for pre-training and finally the performance in various NLP tasks. GPT-3 had already undergone pre-training from 570GB of text and could draw on 175 billion parameters. GPT-4 represents another significant increase, but differs especially in its ability to process images as well. What began as a text-only bot now ushers in an era of transformers with enhanced capabilities in the form of a Large Multimodal Model. The generation of extensive, coherent and absolutely convincing text, on the other hand, has long since become a matter of course.

Architecture: OpenAI has so far kept the exact structure of the language model secret - probably because it can be replicated, as experts Dylan Patel and Gerald Wong analyze. They expect similar powerful NLP models from competitors like Meta and Co soon. GPT-4 is estimated to have around 1.6 trillion parameters in 120 layers, 10 times the scale of GPT-3.

Skills: Analyze text, summarize and translate text, generate coherent text in seconds, human-like answers, generate code, create website based on sketches, analyze graphics, answer questions about images.

performance of GPT-4 and GPT-3.5 in NLP tasks
Comparison of the performance of GPT-4 and GPT-3.5 in different NLP tasks. Source: OpenAI

PaLM 2

Google's text bot Bard is no longer based on the Large Language Model LaMDA since May 2023, but on PaLM 2. The state of the art model is equipped with extensive new functionalities for coding, Google Workspace support and logical reasoning. The training set used includes large amounts of text from scientific papers and websites. There is also improved multilingual support, which now includes over 100 languages. PaLM 2 is available in four different sizes, depending on the end device. Overall, it is clear how Google is focusing on implementations that are as suitable for everyday use as possible in order to ensure comprehensive use. NLP models have thus reached the center of general usability.

Architecture: Transformer, otherwise little known. Significant increase compared to 540 billion parameters of the predecessor is likely.

Skills: More Improvement in most NLP tasks, programming, multilingualism, logic, Google implementations.

Conclusion and prospects

For several years, Natural Language Processing has been one of the most significant AI technologies due to new possibilities of automated language processing. This particularly concerns the generation and translation of text. An extremely important breakthrough in this direction was the presentation of the Transformer architecture by Google in 2017, which experienced successful technical implementation soon after with Language Models such as GPT-1 and BERT in connection with the bidirectional approach of ELMo. Moreover, the use of unsupervised learning contributes to the exponential increase in training scope, complexity, and performance of subsequent models. Due to their growing added value, training costs often play only a minor role. A continuation of these trends is therefore very likely.

technology over time applied on language models

Further developments also focus on the elimination of errors and weaknesses of Language Models. These are e.g. in areas of timeliness, logic and arithmetic. A possible solution to some problems would be the automated use of external apps such as through metas Toolformer. In addition to applications that are increasingly suitable for everyday use, more specialist areas of application are also foreseeable. For example, Med-PaLM 2 is already being tested for medical purposes. Transformers will remain the technical basis for the foreseeable future. However, it cannot be ruled out that they will also have to give way to a new type of NLP model at some point.

If you would like to learn more about Natural Language Processing and the added value of the technology for companies, feel free to use the Contact form. Our experts look forward to the exchange.

About me

More Articles

AI companies

87 AI Companies - Update 2023

We are in a golden age of artificial intelligence (AI) and machine learning, an era in which advanced algorithms...

Read article

The Best AWS Textract Alternative: Top 5 Providers

Looking for an AWS Textract alternative for your business? You already have Amazon's product for your needs?

Read article

Banking Software - Features, Benefits and powerful Software

With the introduction of risk management, customer relationship management (CRM) and automated lending, banks quickly faced a challenge: how should they manage...

Read article