K-Nearest Neighbors (KNN) Algorithm

The K-Nearest Neighbors (KNN) algorithm is a simple, non-parametric, and instance-based supervised learning algorithm used for classification and regression tasks in machine learning. It works by finding the K most similar training examples (neighbors) to a new data point and making predictions based on their labels (for classification) or values (for regression).

In this lesson, we will learn:

  • Features of KNN
  • Advantages of KNN
  • Disadvantages of KNN
  • Applications of KNN
  • Example 1: K-Nearest Neighbors (KNN) for Classification in Python
  • Example 2: K-Nearest Neighbors (KNN) for Regression in Python

Features of KNN

The following are the features of KNN:

  1. Lazy Learning Algorithm – KNN does not learn a model during training; instead, it memorizes the entire dataset and computes predictions at runtime.
  2. Distance-Based – Uses distance metrics (Euclidean, Manhattan, Minkowski, etc.) to find the nearest neighbors.
  3. Hyperparameter (K) – The number of neighbors (K) must be chosen carefully (odd number for classification to avoid ties).
  4. No Training Phase – Unlike other ML algorithms, KNN does not require explicit training.
  5. Works for Both Classification & Regression
    • Classification: Predicts the majority class among K neighbors.
    • Regression: Predicts the average (or weighted average) of K neighbors.

Advantages of KNN

The following are the advantages of KNN:

  • Simple & Easy to Implement – No complex mathematical modeling required.
  • No Training Phase – Works directly on stored data.
  • Adapts to New Data Easily – New data points can be added without retraining.
  • Works Well for Small Datasets – Effective when data is not too large.
  • Versatile – Can be used for both classification and regression.

Disadvantages of KNN

The following are the disadvantages of KNN:

  • Computationally Expensive – Requires calculating distances for all training points for each prediction.
  • Sensitive to Irrelevant Features – Needs feature scaling (normalization/standardization).
  • Struggles with High-Dimensional Data – “Curse of dimensionality” affects performance.
  • Choice of K is Crucial – Small K → Overfitting, Large K → Underfitting.
  • Memory Intensive – Stores the entire dataset.

Applications of KNN

The following are the applications of KNN:

  • Medical Diagnosis (Disease classification based on symptoms)
  • Recommendation Systems (Suggesting products based on similar users)
  • Image Recognition (Handwritten digit classification)
  • Finance (Credit scoring, fraud detection)
  • Geographical Data Analysis (Predicting weather conditions)

Example 1: K-Nearest Neighbors (KNN) for Classification in Python

Here’s a step-by-step implementation of the KNN algorithm for classification using Python and scikit-learn. We’ll use the famous Wine dataset for this example:

  • Objective: Classify wine into 3 categories based on chemical properties.
  • Workflow: Data loading → Preprocessing → Training → Evaluation → Visualization

Steps:

Step 1: Import libraries

Step 2: Load Dataset

Step 3: Display Dataset Info

Step 4: Select 2 most important features for visualization (alcohol & malic_acid)

Step 5: Scale features

Step 6: Split data

Step 7: Train KNN (K=5)

Step 8: Predictions

Step 9: Metrics

Step 10: Plot decision boundaries

Step 11: Feature importance plot (using mean distance impact)

Output

KNN in Classification Output

The output generates the following visualizations also:

Decision Boundary Plot:

    • Shows how KNN classifies based on alcohol and malic acid content
    • Each colored region represents a wine class

Decision Boundary Plot for KNN Classification

Feature Importance:

    • Proxies importance via average neighbor distance per feature
    • Higher bars = more separation power

Feature Impotance for KNN Classification

Example 2 K-Nearest Neighbors (KNN) for Regression in Python

Here’s the complete Python example for KNN Regression using the California Housing dataset, with all necessary steps from data loading to evaluation.

We’re predicting median house values (continuous numbers) based on neighborhood features.

Steps:

Step 1: Import required libraries

Step 2: Load dataset

Step 3: Display dataset info

Step 4: Feature scaling (critical for KNN)

Step 5: Split into train-test (80-20)

Step 6: Initialize and train KNN regressor (K=5)

Step 7: Predictions

Step 8: Evaluation metrics

Step 9: Calculate RMSE by taking the square root of the Mean Squared Error

Step 10: Plot actual vs predicted values

Output

KNN in Regression Output


If you liked the tutorial, spread the word and share the link and our website Studyopedia with others.


For Videos, Join Our YouTube Channel: Join Now


Read More:

Random Forest Algorithm in Machine Learning
Support Vector Machine (SVM) Algorithm in Machine Learning
Studyopedia Editorial Staff
contact@studyopedia.com

We work to create programming tutorials for all.

No Comments

Post A Comment