13 Oct Machine Learning – Median
The median is used to get the middle value in a dataset. It is a measure of central tendency and is calculated by:
- Arranging the values in the dataset in order and finding the value that lies in the middle. This will only happen if there are an odd number of values in the dataset
- Calculating the average of the two middle values. This will only happen if there is an even number of values in the dataset.
Note: Median is the mid-point value.
Mean vs Median
Mean is used to get the mean i.e. the average values. The mean is affected by the outlier.
The median is a useful measure of central tendency because it is not affected by outliers, meaning that extreme values do not significantly affect the value of the median. Let us understand what this means.
An outlier is a data point that is significantly different from the other data points in a dataset. It can skew the results of statistical measures, like the mean, but has little to no effect on the median.
For example, in the dataset [1, 2, 3, 4, 5, 50], the number 50 is an outlier. The mean would be heavily influenced by this extreme value, but the median would still be 3.5. By focusing on the median, you get a better sense of the central tendency without the distortion caused by outliers.
Coding Example – Calculate Median with Python
Let us see an example. To calculate the median in Python, we will use the median() method of the NumPy package.
The NumPy package is used for scientific computing with Python. We will also use the Pandas library to create a Pandas DataFrame. The Pandas DataFrame is a Two-dimensional tabular data structure i.e. table with rows and columns.
Read: Free NumPy Tutorial
Read: Free Pandas Tutorial
The following is the Python code to get the median:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
import numpy as np import pandas as pd # create a pandas dataframe marks = pd.DataFrame({ 'student_rollno': ['0001', '0002', '0003', '0004', '0005', '0006', '0007', '0008', '0009', '0010'], 'student_marks': [96, 83, 90, 87, 67, 60, 75, 80, 77, 70] }) # get the median median_marks = np.median(marks['student_marks']) print('Median marks of students:', median_marks) |
The following is the output:
1 2 3 |
Median marks of students: 78.5 |
If you liked the tutorial, spread the word and share the link and our website Studyopedia with others.
For Videos, Join Our YouTube Channel: Join Now
Read More:
- What is Machine Learning
- What is a Machine Learning Model
- Types of Machine Learning
- Supervised vs Unsupervised vs Reinforcement Machine Learning
- What is Deep Learning
- Feedforward Neural Networks (FNN)
- Convolutional Neural Network (CNN)
- Recurrent Neural Networks (RNN)
- Long short-term memory (LSTM)
- Generative Adversarial Networks (GANs)
No Comments