AI debugging data overfitting underfitting performanceAI debugging data overfitting underfitting performance

Avoid overfitting & underfitting: AI Debugging Guide

Mohamed Dhiab

Many people are enthusiastic about machine learning, but not everyone understands the complex challenges that prevent machine learning (ML) from being used in practice. Even if machine learning algorithms are excellent for certain tasks, they can still have errors and become a major problem. Sources of error include overfitting or underfitting on the data set used or a non-decreasing loss function.

The key to success with Machine Learning is to find out where these errors potentially occur in the models we use - and to eliminate them before they even cause a problem.

What is overfitting?

Overfitting describes the phenomenon in which a machine learning model learns the training data with excessive accuracy, including the noise and specific details it contains. This leads to impaired performance on new, unknown data. Although an over-fitted model shows excellent results on the training data because it has virtually memorized it, it fails when processing new information because it cannot effectively generalize general patterns.

When does overfitting occur?

Overfitting occurs when a machine learning model is trained on training data to such an extent that it learns not only the general patterns, but also the irrelevant noise and the specific exceptions of this data. This often happens if the training period is too long, if the model is overly complex compared to the amount of data, or if the training data is not representative of real-world use cases.

This description is somewhat reminiscent of preparing for an exam. However, exam preparation only consists of memorizing previous exam questions - without developing a deeper understanding of the subject matter.

Why is overfitting problematic?

Overfitting limits the applicability of a machine learning model as it loses its ability to react correctly to new data. An overfitted model that is too fixated on training data often fails on real-world tasks - because it cannot recognize general patterns. This lack of flexibility leads to unreliable predictions outside of the training environment, which significantly limits the model's usefulness in real-world applications.

Overfitting vs. underfitting

The following table provides a clear comparison of overfitting and underfitting. It summarizes the main characteristics, causes and effects of these two challenges in the field of machine learning:

DefinitionToo precise adaptation to training data, including noise and exceptions.Insufficient adaptation to the structure of the training data.
CauseToo complex a model, too long a training period.Too simple a model, insufficient training.
ProblemLoses the ability to generalize, unreliable predictions.Poor performance on training and new data.
RecognitionHigh accuracy on training data, but poor performance on new data.Consistently poor performance.
Behavior on training dataHigh performancePoor performance
Behavior on new dataPoor performancePoor performance

Debugging AI models

If we do not correct models that do not perform well or are unreliable, this can greatly reduce the efficiency of machine learning in an organization in the long run. Similarly, if we ignore problems such as overfitting or underfitting in models that do not perform well or are unpredictable, this can significantly reduce an organization's ability to use ML effectively in the long run.

This shows how important it is to debug models correctly.

But what exactly is the debugging of AI models and how does it differ from code debugging?

AI model debugging vs. code debugging

Debugging AI models is about detecting and fixing errors or inconsistencies in machine learning models. It requires a thorough examination of the quality of a data set, the feature engineering, the model architecture and the training processes and data in order to improve overall performance.

In contrast to traditional code debugging, which focuses on finding and fixing errors in software code, the debugging of AI models deals with the complex interactions and behavior patterns of algorithms trained on data. In code debugging, errors are often unique and can be traced back directly to a line or block of code.

However, debugging AI models is about understanding how data inputs and algorithmic decisions interact to produce results, which can make tracing and diagnosis difficult. Furthermore, debugging AI models often requires domain-specific knowledge and a deep understanding of the underlying data and algorithms. Factors such as data quality, feature engineering, model architecture, training processes and data sets are examined in order to identify sources of error and improve model performance.

Code debugging

  • Finds and fixes errors in the software code.
  • Errors are usually unique and can be traced back to specific lines.
  • Uses IDEs and debugging tools such as breakpoints and variable checking.

AI model debugging

  • Identifies and fixes errors in machine learning models.
  • Deals with complex interactions between data and algorithms.
  • Requires an understanding of the quality of data sets, model architecture and training processes.
  • Uses tools such as data visualization and performance metrics analysis.
AI model debugging vs. code debugging overfitting underfitting

Machine learning debugging - "how to"

How do you debug a machine learning model?

Search for calculation errors

Always start with a small sample of your data set. This is an invaluable debugging method, as if you were going on a journey with a scaled-down prototype of your project. This approach offers several advantages, as it allows you to test every component of your Pipeline for machine learning carefully for potential errors and inconsistencies.

Recognizing errors in the model implementation
By training with a small data set, you can quickly identify implementation errors that manifest themselves in irregular behavior or incorrect predictions. Whether it's a syntax error in your code or a logical error in your algorithm, by starting small you can quickly identify such problems and prevent the threat of underfitting or overfitting.

Possible errors during model implementation

  • mismatched input-output dimensions
  • Incorrect layer configurations
  • Missing layers


Validate the pre-processing and creation of data loaders
Pre-processing and the creation of data loaders are crucial steps in the machine learning pipeline that form the basis for training the model. Training on a reduced data set allows you to review these steps in detail and ensure that the data conversion pipelines work as intended and the data loaders deliver data packets in the expected format.

Evaluate loss and metric calculation
The calculation of loss and evaluation metrics forms the basis of model training, guides the optimization process and evaluates model performance. By training on a small sample of training data, you can verify the correctness of your loss function implementation and metric calculations and ensure that they accurately reflect the performance of the model for the task at hand.

Iterative refinement
With the insights gained from training data on a small sample, you can iteratively refine and debug your machine learning pipeline. Each iteration brings you closer to a robust and reliable model, as you can uncover and fix potential pitfalls that might otherwise have gone unnoticed.

Essentially, starting with a small training sample serves as a touchstone for the robustness and integrity of your pipeline for ML and provides a pragmatic approach to debugging and refinement. As you move through the intricacies of model development, remember that thorough testing and validation on a small scale will pave the way for success on a broader scale.

Experiment Tracking Tools

Experiment tracking tools are essential for recording your model development process. These tools help you track hyperparameters, metrics and other relevant information for each experiment. Using these tools, you can easily compare different models, understand the impact of changes and identify the best performing configurations. Some popular experiment tracking tools are:

TensorBoard is a visualization toolkit included with TensorFlow that allows you to track and visualize various aspects of your machine learning experiments, including model graphs, training metrics, and embeddings.

MLflow is an open source platform for managing the entire lifecycle of machine learning. It provides components for tracking experiments, packaging code into reproducible runs, and sharing and deploying models.

By using experiment tracking tools, you can ensure reproducibility, track model performance over time and optimize collaboration within your team. These tools play an important role in debugging machine learning models by providing insights into the behavior of different configurations and helping you identify the causes of errors such as overfitting and underfitting or unexpected behavior.

Experiment Tracking Overfitting AI Debugging

Checking the learning ability

Again, using a small sample from your training set is recommended. Make sure your model can overfill this small sample to confirm its ability to capture patterns from the data.

Training loss convergence
Monitor the training loss and aim for values close to zero for the small data set, which indicates that the model is able to internalize patterns from the data.

Observation of the training dynamics
Pay attention to rapid loss fluctuations in the model over epochs, which indicate the adaptation of the model to the smallest data details.

If the model does not overfit, you should increase the complexity or explore alternative architectures. Checking the ability for overfitting ensures that your model learns patterns effectively and lays the foundation for robust machine learning.

Avoid overfitting

It's true that based on 3. you should make sure that your model is actually able to capture patterns from the data and overfit a small sample, but: overfitting the entire training set and not being able to perform well in real-world scenarios is an AI model flaw.

Strategies to mitigate overfitting

Overfitting is a common mistake in machine learning, where a model learns to memorize the training data instead of generalizing it well to unseen data. This leads to low training loss and high testing loss. Here are some strategies to mitigate overfitting:

Split your dataset into training and validation sets. Cross-validation techniques such as k-fold cross-validation can provide a better estimate of model performance on unseen data.

Regularization techniques such as L1 and L2 regularization penalize large parameter values in the model and thus prevent overfitting.

Dropout is a technique commonly used in neural networks in which randomly selected neurons are ignored during training. In this way, co-adaptation of the neurons is prevented and the network is encouraged to learn more robust features.

Early termination
Monitor the performance of your model on a validation set during training. Abort the training if the performance deteriorates, indicating that the model is overfitting.

Simplification of the model
Sometimes a simple model can be better generalized to unseen data. Consider reducing the complexity of your model by reducing the number of parameters or using a simpler architecture.

Data extension
Increase the variety of your training data by applying transformations such as rotating, flipping or scaling. This can help expose the model to a wider range of variations in the data.

Ensemble methods
Combine multiple models to make predictions. Ensemble methods such as bagging and boosting can reduce overfitting by averaging the predictions of multiple models.

By implementing these techniques, you ensure that your AI models generalize well to new, unseen data, resulting in more reliable and robust predictions.

Interpret and explain your model

Your machine learning model works like an enigmatic black box and probably makes you curious about the reasons behind its decisions. Understanding the reasons behind these decisions can provide insights into the problem, the dataset and potential points of failure. Interpretability sheds light on the enigmatic nature of our machine learning models by revealing the logic behind their decisions and providing valuable context. Here are some popular explainability tools used in the interpretation and explanation of AI models:

SHAP (SHapley Additive exPlanations)

SHAP is a method based on cooperative game theory that assigns an importance value to each feature for a particular prediction. It provides explanations for individual predictions by assigning the outcome of the prediction to different characteristics.

SHAP values can be visualized using summary diagrams, force diagrams or dependency diagrams and provide information on how each feature contributes to the model predictions.

SHAP Debugging Overfitting

LIME (Local Interpretable Model-agnostic Explanations)

LIME generates locally faithful explanations for individual predictions by approximating complex models by interpretable surrogate models around the prediction of interest. The focus is on understanding model behavior for specific instances.

LIME generates explanations in the form of feature weightings or textual explanations that allow the user to understand why a particular prediction was made.

LIME Debugging Overfitting
Source: GitHub

In this image, the LIME method explains the prediction of "cat". The areas that contributed the most to this prediction are marked in green, those that contributed the least in red.


Effective debugging of AI models is essential for their reliability and performance. Unlike code debugging, this process is about identifying and fixing complex interactions between data and algorithms. Starting with a small data set allows for a thorough inspection, which helps with error detection, pre-processing validation and performance evaluation.

Tools for tracking experiments help to keep records and compare configurations. Strategies such as cross-validation and regularization reduce overfitting, ensure robust performance and protect against underfitting. Interpreting and explaining models increases confidence by providing insights. Overall, systematic debugging, experiment tracking and mitigation strategies are crucial for reliable performance of AI models.

Do you have any questions about AI debugging or would you like to share suggestions for the guide? We look forward to a professional exchange:

    About me

    More Articles

    3 Docker Desktop Alternatives for 2023

    Even though container technology has been around for a long time, it's only since the launch of Docker in 2013 that it has become a...

    Read article
    Data Warehouse Title

    Data warehouse: definition and benefits in the company

    With the help of a data warehouse, you can combine data from many different sources into a single data repository and thus improve the...

    Read article

    Microservice: The right Choice for Companies?

    Developing software is like building a house: You need a solid foundation and a clear structure so that the...

    Read article