24 Sep Overfitting & Underfitting in Machine Learning
Overfitting and Underfitting impacts the performance of a Machine Learning model. Let us understand them and see their differences.
Overfitting
Overfitting occurs when the Machine Learning model performs well on the training data but badly on unseen data. This occurs when we have a complex model like having too many parameters relative to the number of observations.
Low accuracy is shown by the model on test data, but the accuracy is high on the training data. In overfitting, the noise and outliers are captured by the model in the training data.
Note: High variance is an indicator of Overfitting.
How to Avert Overfitting
- Easier model: Use a less complex model or reduce the number of parameters
- Regularization: Regularization is a technique to prevent overfitting by adding a penalty to the model’s complexity. Overfitting works poorly on unseen data; therefore, regularization assists the model to generalize better to new, unseen data.
- Pruning: Prune branches that have little importance in decision trees.
- Cross-validation: Techniques like k-fold cross-validation can be used to make sure the model generalizes well.
Underfitting
Underfitting occurs when the Machine Learning model is not trained long enough. That means the model performs poorly on the training as well as the new data.
Low accuracy is shown by the model on both the training as well as test data.
Note: High bias is an indicator of Underfitting.
How to Avert Underfitting
- Higher model complexity: A more complex model should be used with more parameters.
- Feature engineering for patterns: Add more relevant features to assist the model capture the underlying patterns.
- Train for longer iterations: The model should be trained for sufficient iterations or epochs.
Let us see the differences between Overfitting and Underfitting learning:
Overfitting | Underfitting | |
---|---|---|
What | Model learns the training data too well. This includes noise and outliers as well. | Model is too easy to capture the underlying patterns in the data. |
Accuracy | Low accuracy is shown by the model on test data, but the accuracy is high on the training data. | Low accuracy is shown by the model on both the training as well as test data. |
Model Easiness | Too complex because parameters are more | Too easy because the parameters are less. |
Indicators | High variance. | High bias. |
Performance | Large difference between training and validation/test accuracy. | Similar performance on training and validation/test data. |
Causes | Too complex model. Insufficient training data. | Too easier model. Insufficient training time. |
How to prevent | Ease the model. Use regularization technique. Use cross-validation. | Make the model complex. Train the model longer. |
If you liked the tutorial, spread the word and share the link and our website Studyopedia with others.
For Videos, Join Our YouTube Channel: Join Now
Read More:
No Comments