Pandas – Statistical Functions

In this lesson, we will work around statistics operations using the statistical functions in Python Pandas.  It can be applied to a Series or DataFrame.

  • sum(): Return the sum of the values.
  • count(): Return the count of non-empty values.
  • max(): Return the maximum of the values.
  • min(): Return the minimum of the values.
  • mean(): Return the mean of the values.
  • median(): Return the median of the values.
  • std(): Return the standard deviation of the values.
  • describe(): Return the summary statistics for each column.

Before moving further, we’ve prepared a video tutorial to implement statistical functions in Pandas:

Pandas sum() method

The sum() method in Python Pandas is used to return the sum of the values. Let us see an example:

import pandas as pd
 
# Dataset
data = {
  'Maths': [90, 85, 98, 80, 55, 78],
  'Science': [92, 87, 59, 64, 87, 96],
  'English': [95, 94, 84, 75, 67, 65]
}
 
# DataFrame 
df = pd.DataFrame(data)

# Display the DataFrame
print("DataFrame = \n",df)

# Display the Sum of Marks in each column 
print("\nSum = \n",df.sum())

Output

DataFrame = 
    Maths  Science  English
0     90       92       95
1     85       87       94
2     98       59       84
3     80       64       75
4     55       87       67
5     78       96       65

Mean = 
Maths      486
Science    485
English    480
dtype: int64

Pandas count() method

The count() method in Python Pandas is used to return the count of non-empty values. Let us see an example:

import pandas as pd
 
# Dataset
data = {
  'Maths': [90, 85, 98, None, 55, 78],
  'Science': [92, 87, 59, None, None, 96],
  'English': [95, None, 84, 75, 67, None]
}
 
# DataFrame 
df = pd.DataFrame(data)

# Display the DataFrame
print("DataFrame = \n",df)

# Display the Count of non-empty values in each column 
print("\nCount of non-empty values = \n",df.count())

Output

DataFrame = 
    Maths  Science  English
0   90.0     92.0     95.0
1   85.0     87.0      NaN
2   98.0     59.0     84.0
3    NaN      NaN     75.0
4   55.0      NaN     67.0
5   78.0     96.0      NaN

Count of non-empty values = 
Maths      5
Science    4
English    4
dtype: int64

Pandas max() method

The max() method in Python Pandas is used to return the maximum of the values. Let us see an example:

import pandas as pd
 
# Dataset
data = {
  'Maths': [90, 85, 98, 80, 55, 78],
  'Science': [92, 87, 59, 64, 87, 96],
  'English': [95, 94, 84, 75, 67, 65]
}
 
# DataFrame 
df = pd.DataFrame(data)

# Display the DataFrame
print("DataFrame = \n",df)

# Display the Maximum of Marks in each column 
print("\nMaximum Marks = \n",df.max())

Output

DataFrame = 
    Maths  Science  English
0     90       92       95
1     85       87       94
2     98       59       84
3     80       64       75
4     55       87       67
5     78       96       65

Maximum Marks = 
Maths      98
Science    96
English    95
dtype: int64

Pandas min() method

The min() method in Python Pandas is used to return the minimum of the values. Let us see an example:

import pandas as pd
 
# Dataset
data = {
  'Maths': [90, 85, 98, 80, 55, 78],
  'Science': [92, 87, 59, 64, 87, 96],
  'English': [95, 94, 84, 75, 67, 65]
}
 
# DataFrame 
df = pd.DataFrame(data)

# Display the DataFrame
print("DataFrame = \n",df)

# Display the Minimum of Marks in each column 
print("\nMinimum Marks = \n",df.min())

Output

DataFrame = 
    Maths  Science  English
0     90       92       95
1     85       87       94
2     98       59       84
3     80       64       75
4     55       87       67
5     78       96       65

Minimum Marks = 
Maths      55
Science    59
English    65
dtype: int64

Pandas mean() method

The mean() method in Python Pandas is used to return the mean of the values. Let us see an example:

import pandas as pd
 
# Dataset
data = {
  'Maths': [90, 85, 98, 80, 55, 78],
  'Science': [92, 87, 59, 64, 87, 96],
  'English': [95, 94, 84, 75, 67, 65]
}

# DataFrame 
df = pd.DataFrame(data)

# Display the DataFrame
print("DataFrame = \n",df)

# Display the Mean of Marks in each column
print("\nMean = \n",df.mean())

Output

DataFrame = 
    Maths  Science  English
0     90       92       95
1     85       87       94
2     98       59       84
3     80       64       75
4     55       87       67
5     78       96       65

Mean = 
Maths      81.000000
Science    80.833333
English    80.000000
dtype: float64

Pandas median() method

The median() method in Python Pandas is used to return the median of the values. Let us see an example:

import pandas as pd
 
# Dataset
data = {
  'Maths': [90, 85, 98, 80, 55, 78],
  'Science': [92, 87, 59, 64, 87, 96],
  'English': [95, 94, 84, 75, 67, 65]
}
 
# DataFrame 
df = pd.DataFrame(data)

# Display the DataFrame
print("DataFrame = \n",df)

# Display the Median of Marks in each column 
print("\nMedian = \n",df.median())

Output

DataFrame = 
    Maths  Science  English
0     90       92       95
1     85       87       94
2     98       59       84
3     80       64       75
4     55       87       67
5     78       96       65

Median = 
Maths      82.5
Science    87.0
English    79.5
dtype: float64

Pandas std() method

The std() method in Python Pandas is used to return the standard deviation of the values. Let us see an example:

import pandas as pd
 
# Dataset
data = {
  'Maths': [90, 85, 98, 80, 55, 78],
  'Science': [92, 87, 59, 64, 87, 96],
  'English': [95, 94, 84, 75, 67, 65]
}
 
# DataFrame 
df = pd.DataFrame(data)

# Display the DataFrame
print("DataFrame = \n",df)

# Display the Standard Deviation of Marks in each column 
print("\nStandard Deviation = \n",df.std())

Output

DataFrame = 
    Maths  Science  English
0     90       92       95
1     85       87       94
2     98       59       84
3     80       64       75
4     55       87       67
5     78       96       65

Standard Deviation = 
Maths      14.642404
Science    15.432649
English    13.084342
dtype: float64

Pandas describe() method

The describe() method in Python Pandas is used to return the summary statistics for each column. Let us see an example:

import pandas as pd
 
# Dataset
data = {
  'Maths': [90, 85, 98, None, 55, 78],
  'Science': [92, 87, 59, None, None, 96],
  'English': [95, None, 84, 75, 67, None]
}
 
# DataFrame 
df = pd.DataFrame(data)

# Display the DataFrame
print("DataFrame = \n",df)

# Display the summary using the describe() method
print("\nSummary of Statistics = \n",df.describe())

Output

DataFrame = 
    Maths  Science  English
0   90.0     92.0     95.0
1   85.0     87.0      NaN
2   98.0     59.0     84.0
3    NaN      NaN     75.0
4   55.0      NaN     67.0
5   78.0     96.0      NaN

Sumamry of Statistics = 
           Maths    Science    English
count   5.00000   4.000000   4.000000
mean   81.20000  83.500000  80.250000
std    16.36154  16.743158  12.038134
min    55.00000  59.000000  67.000000
25%    78.00000  80.000000  73.000000
50%    85.00000  89.500000  79.500000
75%    90.00000  93.000000  86.750000
max    98.00000  96.000000  95.000000

If you liked the tutorial, spread the word and share the link and our website Studyopedia with others.


For Videos, Join Our YouTube Channel: Join Now


Read More:

Pandas - Group the Data
Categorical Data in Pandas
Studyopedia Editorial Staff
contact@studyopedia.com

We work to create programming tutorials for all.

No Comments

Post A Comment