24 Jan Pandas – Group the Data
In this lesson, we will learn how to group data in a DataFrame and perform operations on it. First, we will split the data into groups, then we will iterate through the groups and then display the groups. Let us see what we will cover:
- Split the object and combine the result
- Iterate the Group
- View the Group
- Perform Aggregation Operations on Groups
Pandas – Split the object and combine the result
The groupby() method is used in Pandas to split the object. We can define groupby() as grouping the rows/columns into specific groups. In the below example, we are grouping by the Player column:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
import pandas as pd # Our Dataset data = { 'Player': ["Amit", "John", "Amit", "David", "Steve", "John"], 'Rank': [1, 4, 3, 5, 2, 7], 'Year': [2023, 2022, 2021, 2022, 2018, 2019] } # Our DataFrame df = pd.DataFrame(data) # Display the records print("Player Records\n\n", df) # Group the data on Player value res = df.groupby('Player') # Print the first entry print("\n", res.first()) |
Output
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
Player Records Player Rank Year 0 Amit 1 2023 1 John 4 2022 2 Amit 3 2021 3 David 5 2022 4 Steve 2 2018 5 John 7 2019 Rank Year Player Amit 1 2023 David 5 2022 John 4 2022 Steve 2 2018 |
Iterate the Group
Iterate and loop through the groups with groupby() using the for-in loop. In the below example, the iteration is through the group Player one by one:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
import pandas as pd # Our Dataset data = { 'Player': ["Amit", "John", "Amit", "David", "Steve", "John"], 'Rank': [1, 4, 3, 5, 2, 7], 'Year': [2023, 2022, 2021, 2022, 2018, 2019] } # Our DataFrame df = pd.DataFrame(data) # Display the records print("Player Records\n\n", df) # Group by Player groupRes = df.groupby('Player') for name,group in groupRes: print("\n",name) print(group) |
Output
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
Player Records Player Rank Year 0 Amit 1 2023 1 John 4 2022 2 Amit 3 2021 3 David 5 2022 4 Steve 2 2018 5 John 7 2019 Amit Player Rank Year 0 Amit 1 2023 2 Amit 3 2021 David Player Rank Year 3 David 5 2022 John Player Rank Year 1 John 4 2022 5 John 7 2019 Steve Player Rank Year 4 Steve 2 2018 |
View the Group
Use the groups property in Python Pandas to view the group. Let us see an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
import pandas as pd # Our Dataset data = { 'Player': ["Amit", "John", "Amit", "David", "Steve", "John"], 'Rank': [1, 4, 3, 5, 2, 7], 'Year': [2023, 2022, 2021, 2022, 2018, 2019] } # Our DataFrame df = pd.DataFrame(data) # Display the records print("Player Records\n\n", df) # Group by Player and Display print(df.groupby('Player').groups) |
Output
1 2 3 4 5 6 7 8 9 10 11 12 |
Player Records Player Rank Year 0 Amit 1 2023 1 John 4 2022 2 Amit 3 2021 3 David 5 2022 4 Steve 2 2018 5 John 7 2019 {'Amit': [0, 2], 'David': [3], 'John': [1, 5], 'Steve': [4]} |
Aggregation Operation on Groups
After grouping, we can perform operations on the grouped data using the agg() method. Through this method, get the mean or even get the size of each group, etc. Let’s see some examples:
- Get the mean of the grouped data
- Get the size of each group
Get the mean of the grouped data
To get the mean of the grouped data, first, group and then use the agg() method with numpy.mean(). Let us see an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
import pandas as pd import numpy as np # Our Dataset data = { 'Player': ["Amit", "John", "Amit", "David", "Steve", "John"], 'Rank': [1, 4, 3, 5, 2, 7], 'Points': [95, 70, 65, 80, 90, 50], 'Year': [2023, 2022, 2021, 2022, 2023, 2019] } # Our DataFrame df = pd.DataFrame(data) # Display the records print("Player Records\n\n", df) # Use the groupby() to group groupRes = df.groupby('Year') # The agg() is used to perform aggregation print("\n",groupRes['Points'].agg(np.mean)) |
Output
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
Player Records Player Rank Points Year 0 Amit 1 95 2023 1 John 4 70 2022 2 Amit 3 65 2021 3 David 5 80 2022 4 Steve 2 90 2023 5 John 7 50 2019 Year 2019 50.0 2021 65.0 2022 75.0 2023 92.5 Name: Points, dtype: float64 |
Get the size of each group
To get the size of each group, use the Numpy size attribute in Pandas. We have grouped by the Player column using the groupby(). Let us see an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
import pandas as pd import numpy as np # Our Dataset data = { 'Player': ["Amit", "John", "Amit", "David", "Steve", "John"], 'Rank': [1, 4, 3, 5, 2, 7], 'Points': [95, 70, 65, 80, 90, 50], 'Year': [2023, 2022, 2021, 2022, 2023, 2019] } # Our DataFrame df = pd.DataFrame(data) # Display the records print("Player Records\n\n", df) # Use the groupby() to group groupRes = df.groupby('Player') # The agg() is used to perform aggregation # The numpy.size attribute returns the size of each group print("\n",groupRes.agg(np.size)) |
Output
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
Player Records Player Rank Points Year 0 Amit 1 95 2023 1 John 4 70 2022 2 Amit 3 65 2021 3 David 5 80 2022 4 Steve 2 90 2023 5 John 7 50 2019 Rank Points Year Player Amit 2 2 2 David 1 1 1 John 2 2 2 Steve 1 1 1 |
If you liked the tutorial, spread the word and share the link and our website Studyopedia with others:
For Videos, Join Our YouTube Channel: Join Now
Read More:
No Comments