Overfitting & Underfitting in Machine Learning

Overfitting and Underfitting impacts the performance of a Machine Learning model. Let us understand them and see their differences.

Overfitting

Overfitting occurs when the Machine Learning model performs well on the training data but badly on unseen data. This occurs when we have a complex model like having too many parameters relative to the number of observations.

Low accuracy is shown by the model on test data, but the accuracy is high on the training data. In overfitting, the noise and outliers are captured by the model in the training data.

Note: High variance is an indicator of Overfitting.

How to Avert Overfitting

  • Easier model: Use a less complex model or reduce the number of parameters
  • Regularization: Regularization is a technique to prevent overfitting by adding a penalty to the model’s complexity. Overfitting works poorly on unseen data; therefore, regularization assists the model to generalize better to new, unseen data.
  • Pruning: Prune branches that have little importance in decision trees.
  • Cross-validation: Techniques like k-fold cross-validation can be used to make sure the model generalizes well.

Underfitting

Underfitting occurs when the Machine Learning model is not trained long enough. That means the model performs poorly on the training as well as the new data.

Low accuracy is shown by the model on both the training as well as test data.

Note: High bias is an indicator of Underfitting.

How to Avert Underfitting

  • Higher model complexity: A more complex model should be used with more parameters.
  • Feature engineering for patterns: Add more relevant features to assist the model capture the underlying patterns.
  • Train for longer iterations: The model should be trained for sufficient iterations or epochs.

Let us see the differences between Overfitting and Underfitting learning:

OverfittingUnderfitting
WhatModel learns the training data too well. This includes noise and outliers as well.Model is too easy to capture the underlying patterns in the data.
AccuracyLow accuracy is shown by the model on test data, but the accuracy is high on the training data.Low accuracy is shown by the model on both the training as well as test data.
Model EasinessToo complex because parameters are moreToo easy because the parameters are less.
IndicatorsHigh variance.High bias.
PerformanceLarge difference between training and validation/test accuracy.Similar performance on training and validation/test data.
CausesToo complex model.
Insufficient training data.
Too easier model.
Insufficient training time.
How to preventEase the model.
Use regularization technique.
Use cross-validation.
Make the model complex.
Train the model longer.

If you liked the tutorial, spread the word and share the link and our website Studyopedia with others.


For Videos, Join Our YouTube Channel: Join Now


Read More:

Inductive vs Deductive Machine Learning
Confusion Matrix in Machine Learning
Studyopedia Editorial Staff
contact@studyopedia.com

We work to create programming tutorials for all.

No Comments

Post A Comment

Discover more from Studyopedia

Subscribe now to keep reading and get access to the full archive.

Continue reading