14 Mar Accuracy in Machine Learning
Let’s dive into Accuracy, one of the most commonly used evaluation metrics in machine learning. Accuracy is a straightforward and intuitive way to measure how well a model is performing, especially in classification tasks.
What is Accuracy
Accuracy measures the proportion of correct predictions (both true positives and true negatives) out of all predictions made by the model. It answers the question: “What percentage of predictions did the model get right?”
Formula of Accuracy
Accuracy is calculated using the following formula:
Where:
- TP (True Positives): Correctly predicted positive cases.
- TN (True Negatives): Correctly predicted negative cases.
- FP (False Positives): Incorrectly predicted positive cases.
- FN (False Negatives): Incorrectly predicted negative cases.
When to Use Accuracy
Accuracy is a good metric to use when:
- Classes are Balanced: The dataset has roughly the same number of samples for each class.
- Example: A dataset with 50% spam emails and 50% legitimate emails.
- Simple Baseline: You want a quick and easy way to evaluate model performance.
- Intuitive Interpretation: You need a metric that is easy to explain to non-technical stakeholders.
Example of Accuracy
Let’s say you have a binary classification problem where you’re predicting whether an email is spam (Positive) or not spam (Negative). After evaluating your model, you get the following results:
- True Positives (TP): 90 (spam emails correctly identified as spam).
- True Negatives (TN): 850 (legitimate emails correctly identified as not spam).
- False Positives (FP): 10 (legitimate emails incorrectly flagged as spam).
- False Negatives (FN): 50 (spam emails incorrectly identified as legitimate).
Using the formula:
This means the model is correct 94% of the time.
Advantages of Accuracy
- Simple and Intuitive: Easy to understand and explain.
- Works Well for Balanced Datasets: When classes are roughly equal in size, accuracy provides a good measure of performance.
- Quick Evaluation: Provides a single number to summarize model performance.
Limitations of Accuracy
While accuracy is useful, it has some significant limitations, especially in certain scenarios:
- Misleading for Imbalanced Datasets:
- If one class dominates the dataset, accuracy can be high even if the model performs poorly on the minority class.
- Example: In a dataset with 95% non-spam emails and 5% spam emails, a model that always predicts “not spam” will have 95% accuracy, but it’s useless for detecting spam.
- Ignores Type I and Type II Errors:
- Accuracy doesn’t distinguish between false positives (FP) and false negatives (FN).
- In some applications (e.g., medical diagnosis), false negatives (missing a disease) are much more costly than false positives (false alarms).
- Not Suitable for Probabilistic Predictions:
- Accuracy treats all predictions as binary (correct or incorrect), ignoring the confidence or probability of predictions.
When Not to Use Accuracy
Avoid using accuracy when:
- Classes are Imbalanced: Use metrics like precision, recall, or F1-score instead.
- Cost of Errors is Unequal: If false positives and false negatives have different costs, accuracy won’t reflect this.
- Probabilistic Predictions are Important: Use metrics like log loss or AUC-ROC.
Hands-On Example of Accuracy
Let’s calculate accuracy using Python and Scikit-learn:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
from sklearn.metrics import accuracy_score # True labels y_true = [0, 1, 1, 0, 1, 0, 0, 1, 1, 0] # 0 = Not Spam, 1 = Spam # Predicted labels y_pred = [0, 1, 0, 0, 1, 0, 1, 1, 1, 0] # Model's predictions # Calculate accuracy accuracy = accuracy_score(y_true, y_pred) print(f"Accuracy: {accuracy * 100:.2f}%") |
Output
1 2 3 |
Accuracy: 80.00% |
Key Takeaways
- Accuracy measures the proportion of correct predictions out of all predictions.
- It’s a simple and intuitive metric, but it has limitations, especially for imbalanced datasets.
- Use accuracy when classes are balanced and the cost of errors is equal.
- For imbalanced datasets or unequal error costs, consider using precision, recall, F1-score, or AUC-ROC.
If you liked the tutorial, spread the word and share the link and our website Studyopedia with others.
For Videos, Join Our YouTube Channel: Join Now
Read More:
- NLP Tutorial
- Generative AI Tutorial
- Machine Learning Tutorial
- Deep Learning Tutorial
- Ollama Tutorial
- Retrieval Augmented Generation (RAG) Tutorial
- Copilot Tutorial
- Gemini Tutorial
- ChatGPT Tutorial
No Comments