Pandas – Group the Data

In this lesson, we will learn how to group data in a DataFrame and perform operations on it. First, we will split the data into groups, then we will iterate through the groups and then display the groups. Let us see what we will cover:

  • Split the object and combine the result
  • Iterate the Group
  • View the Group
  • Perform Aggregation Operations on Groups

Before moving further, we’ve prepared a video tutorial to group the data in Pandas:

Pandas – Split the object and combine the result

The groupby() method is used in Pandas to split the object. We can define groupby() as grouping the rows/columns into specific groups. In the below example, we are grouping by the Player column:

import pandas as pd
 
# Our Dataset
data = {
    'Player': ["Amit", "John", "Amit", "David", "Steve", "John"],
    'Rank': [1, 4, 3, 5, 2, 7],
    'Year': [2023, 2022, 2021, 2022, 2018, 2019]
}
 
# Our DataFrame
df = pd.DataFrame(data)
 
# Display the records
print("Player Records\n\n", df)
 
# Group the data on Player value
res = df.groupby('Player')

# Print the first entry
print("\n", res.first())

Output

Player Records

   Player  Rank  Year
0   Amit     1  2023
1   John     4  2022
2   Amit     3  2021
3  David     5  2022
4  Steve     2  2018
5   John     7  2019

         Rank  Year
Player            
Amit       1  2023
David      5  2022
John       4  2022
Steve      2  2018

Iterate the Group

Iterate and loop through the groups with groupby() using the for-in loop. In the below example, the iteration is through the group Player one by one:

import pandas as pd

# Our Dataset
data = {
    'Player': ["Amit", "John", "Amit", "David", "Steve", "John"],
    'Rank': [1, 4, 3, 5, 2, 7],
    'Year': [2023, 2022, 2021, 2022, 2018, 2019]
}

# Our DataFrame
df = pd.DataFrame(data)

# Display the records
print("Player Records\n\n", df)

# Group by Player
groupRes = df.groupby('Player')

for name,group in groupRes:
   print("\n",name)
   print(group)

Output

Player Records

   Player  Rank  Year
0   Amit     1  2023
1   John     4  2022
2   Amit     3  2021
3  David     5  2022
4  Steve     2  2018
5   John     7  2019

 Amit
  Player  Rank  Year
0   Amit     1  2023
2   Amit     3  2021

 David
  Player  Rank  Year
3  David     5  2022

 John
  Player  Rank  Year
1   John     4  2022
5   John     7  2019

 Steve
  Player  Rank  Year
4  Steve     2  2018

View the Group

Use the groups property in Python Pandas to view the group. Let us see an example:

import pandas as pd

# Our Dataset
data = {
    'Player': ["Amit", "John", "Amit", "David", "Steve", "John"],
    'Rank': [1, 4, 3, 5, 2, 7],
    'Year': [2023, 2022, 2021, 2022, 2018, 2019]
}

# Our DataFrame
df = pd.DataFrame(data)

# Display the records
print("Player Records\n\n", df)

# Group by Player and Display
print(df.groupby('Player').groups)

Output

Player Records

   Player  Rank  Year
0   Amit     1  2023
1   John     4  2022
2   Amit     3  2021
3  David     5  2022
4  Steve     2  2018
5   John     7  2019
{'Amit': [0, 2], 'David': [3], 'John': [1, 5], 'Steve': [4]}

Aggregation Operation on Groups

After grouping, we can perform operations on the grouped data using the agg() method. Through this method, get the mean or even get the size of each group, etc. Let’s see some examples:

  • Get the mean of the grouped data
  • Get the size of each group

Get the mean of the grouped data

To get the mean of the grouped data, first, group and then use the agg() method with numpy.mean(). Let us see an example:

import pandas as pd
import numpy as np

# Our Dataset
data = {
    'Player': ["Amit", "John", "Amit", "David", "Steve", "John"],
    'Rank': [1, 4, 3, 5, 2, 7],
    'Points': [95, 70, 65, 80, 90, 50],
    'Year': [2023, 2022, 2021, 2022, 2023, 2019]
}

# Our DataFrame
df = pd.DataFrame(data)

# Display the records
print("Player Records\n\n", df)

# Use the groupby() to group
groupRes = df.groupby('Year')

# The agg() is used to perform aggregation
print("\n",groupRes['Points'].agg(np.mean))

Output

Player Records

   Player  Rank  Points  Year
0   Amit     1      95  2023
1   John     4      70  2022
2   Amit     3      65  2021
3  David     5      80  2022
4  Steve     2      90  2023
5   John     7      50  2019

 Year
2019    50.0
2021    65.0
2022    75.0
2023    92.5
Name: Points, dtype: float64

Get the size of each group

To get the size of each group, use the Numpy size attribute in Pandas. We have grouped by the Player column using the groupby(). Let us see an example:

import pandas as pd
import numpy as np

# Our Dataset
data = {
    'Player': ["Amit", "John", "Amit", "David", "Steve", "John"],
    'Rank': [1, 4, 3, 5, 2, 7],
    'Points': [95, 70, 65, 80, 90, 50],
    'Year': [2023, 2022, 2021, 2022, 2023, 2019]
}

# Our DataFrame
df = pd.DataFrame(data)

# Display the records
print("Player Records\n\n", df)

# Use the groupby() to group
groupRes = df.groupby('Player')

# The agg() is used to perform aggregation
# The numpy.size attribute returns the size of each group
print("\n",groupRes.agg(np.size))

Output

Player Records

   Player  Rank  Points  Year
0   Amit     1      95  2023
1   John     4      70  2022
2   Amit     3      65  2021
3  David     5      80  2022
4  Steve     2      90  2023
5   John     7      50  2019

         Rank  Points  Year
Player                    
Amit       2       2     2
David      1       1     1
John       2       2     2
Steve      1       1     1

If you liked the tutorial, spread the word and share the link and our website Studyopedia with others:


For Videos, Join Our YouTube Channel: Join Now


Read More:

Pandas - Cleaning the Data
Pandas - Statistical Functions
Studyopedia Editorial Staff
contact@studyopedia.com

We work to create programming tutorials for all.

No Comments

Post A Comment