24 Jan Pandas – Group the Data
In this lesson, we will learn how to group data in a DataFrame and perform operations on it. First, we will split the data into groups, then we will iterate through the groups and then display the groups. Let us see what we will cover:
- Split the object and combine the result
- Iterate the Group
- View the Group
- Perform Aggregation Operations on Groups
Before moving further, we’ve prepared a video tutorial to group the data in Pandas:
Pandas – Split the object and combine the result
The groupby() method is used in Pandas to split the object. We can define groupby() as grouping the rows/columns into specific groups. In the below example, we are grouping by the Player column:
import pandas as pd
# Our Dataset
data = {
'Player': ["Amit", "John", "Amit", "David", "Steve", "John"],
'Rank': [1, 4, 3, 5, 2, 7],
'Year': [2023, 2022, 2021, 2022, 2018, 2019]
}
# Our DataFrame
df = pd.DataFrame(data)
# Display the records
print("Player Records\n\n", df)
# Group the data on Player value
res = df.groupby('Player')
# Print the first entry
print("\n", res.first())
Output
Player Records
Player Rank Year
0 Amit 1 2023
1 John 4 2022
2 Amit 3 2021
3 David 5 2022
4 Steve 2 2018
5 John 7 2019
Rank Year
Player
Amit 1 2023
David 5 2022
John 4 2022
Steve 2 2018
Iterate the Group
Iterate and loop through the groups with groupby() using the for-in loop. In the below example, the iteration is through the group Player one by one:
import pandas as pd
# Our Dataset
data = {
'Player': ["Amit", "John", "Amit", "David", "Steve", "John"],
'Rank': [1, 4, 3, 5, 2, 7],
'Year': [2023, 2022, 2021, 2022, 2018, 2019]
}
# Our DataFrame
df = pd.DataFrame(data)
# Display the records
print("Player Records\n\n", df)
# Group by Player
groupRes = df.groupby('Player')
for name,group in groupRes:
print("\n",name)
print(group)
Output
Player Records Player Rank Year 0 Amit 1 2023 1 John 4 2022 2 Amit 3 2021 3 David 5 2022 4 Steve 2 2018 5 John 7 2019 Amit Player Rank Year 0 Amit 1 2023 2 Amit 3 2021 David Player Rank Year 3 David 5 2022 John Player Rank Year 1 John 4 2022 5 John 7 2019 Steve Player Rank Year 4 Steve 2 2018
View the Group
Use the groups property in Python Pandas to view the group. Let us see an example:
import pandas as pd
# Our Dataset
data = {
'Player': ["Amit", "John", "Amit", "David", "Steve", "John"],
'Rank': [1, 4, 3, 5, 2, 7],
'Year': [2023, 2022, 2021, 2022, 2018, 2019]
}
# Our DataFrame
df = pd.DataFrame(data)
# Display the records
print("Player Records\n\n", df)
# Group by Player and Display
print(df.groupby('Player').groups)
Output
Player Records
Player Rank Year
0 Amit 1 2023
1 John 4 2022
2 Amit 3 2021
3 David 5 2022
4 Steve 2 2018
5 John 7 2019
{'Amit': [0, 2], 'David': [3], 'John': [1, 5], 'Steve': [4]}
Aggregation Operation on Groups
After grouping, we can perform operations on the grouped data using the agg() method. Through this method, get the mean or even get the size of each group, etc. Let’s see some examples:
- Get the mean of the grouped data
- Get the size of each group
Get the mean of the grouped data
To get the mean of the grouped data, first, group and then use the agg() method with numpy.mean(). Let us see an example:
import pandas as pd
import numpy as np
# Our Dataset
data = {
'Player': ["Amit", "John", "Amit", "David", "Steve", "John"],
'Rank': [1, 4, 3, 5, 2, 7],
'Points': [95, 70, 65, 80, 90, 50],
'Year': [2023, 2022, 2021, 2022, 2023, 2019]
}
# Our DataFrame
df = pd.DataFrame(data)
# Display the records
print("Player Records\n\n", df)
# Use the groupby() to group
groupRes = df.groupby('Year')
# The agg() is used to perform aggregation
print("\n",groupRes['Points'].agg(np.mean))
Output
Player Records Player Rank Points Year 0 Amit 1 95 2023 1 John 4 70 2022 2 Amit 3 65 2021 3 David 5 80 2022 4 Steve 2 90 2023 5 John 7 50 2019 Year 2019 50.0 2021 65.0 2022 75.0 2023 92.5 Name: Points, dtype: float64
Get the size of each group
To get the size of each group, use the Numpy size attribute in Pandas. We have grouped by the Player column using the groupby(). Let us see an example:
import pandas as pd
import numpy as np
# Our Dataset
data = {
'Player': ["Amit", "John", "Amit", "David", "Steve", "John"],
'Rank': [1, 4, 3, 5, 2, 7],
'Points': [95, 70, 65, 80, 90, 50],
'Year': [2023, 2022, 2021, 2022, 2023, 2019]
}
# Our DataFrame
df = pd.DataFrame(data)
# Display the records
print("Player Records\n\n", df)
# Use the groupby() to group
groupRes = df.groupby('Player')
# The agg() is used to perform aggregation
# The numpy.size attribute returns the size of each group
print("\n",groupRes.agg(np.size))
Output
Player Records
Player Rank Points Year
0 Amit 1 95 2023
1 John 4 70 2022
2 Amit 3 65 2021
3 David 5 80 2022
4 Steve 2 90 2023
5 John 7 50 2019
Rank Points Year
Player
Amit 2 2 2
David 1 1 1
John 2 2 2
Steve 1 1 1
If you liked the tutorial, spread the word and share the link and our website Studyopedia with others:
For Videos, Join Our YouTube Channel: Join Now
Read More:
No Comments