Pandas - Group the Data

24 Jan Pandas – Group the Data

Posted at 12:54h in Pandas by Studyopedia Editorial Staff 0 Comments

In this lesson, we will learn how to group data in a DataFrame and perform operations on it. First, we will split the data into groups, then we will iterate through the groups and then display the groups. Let us see what we will cover:

Split the object and combine the result
Iterate the Group
View the Group
Perform Aggregation Operations on Groups

Before moving further, we’ve prepared a video tutorial to group the data in Pandas:

Pandas – Split the object and combine the result

The groupby() method is used in Pandas to split the object. We can define groupby() as grouping the rows/columns into specific groups. In the below example, we are grouping by the Player column:

import pandas as pd

# Our Dataset

data = {

'Player': ["Amit", "John", "Amit", "David", "Steve", "John"],

'Rank': [1, 4, 3, 5, 2, 7],

'Year': [2023, 2022, 2021, 2022, 2018, 2019]

}

# Our DataFrame

df = pd.DataFrame(data)

# Display the records

print("Player Records\n\n", df)

# Group the data on Player value

res = df.groupby('Player')

# Print the first entry

print("\n", res.first())

Output

Player Records

Player Rank Year

0 Amit 1 2023

1 John 4 2022

2 Amit 3 2021

3 David 5 2022

4 Steve 2 2018

5 John 7 2019

Rank Year

Player

Amit 1 2023

David 5 2022

John 4 2022

Steve 2 2018

Iterate the Group

Iterate and loop through the groups with groupby() using the for-in loop. In the below example, the iteration is through the group Player one by one:

import pandas as pd

# Our Dataset

data = {

'Player': ["Amit", "John", "Amit", "David", "Steve", "John"],

'Rank': [1, 4, 3, 5, 2, 7],

'Year': [2023, 2022, 2021, 2022, 2018, 2019]

}

# Our DataFrame

df = pd.DataFrame(data)

# Display the records

print("Player Records\n\n", df)

# Group by Player

groupRes = df.groupby('Player')

for name,group in groupRes:

print("\n",name)

print(group)

Output

Player Records

Player Rank Year

0 Amit 1 2023

1 John 4 2022

2 Amit 3 2021

3 David 5 2022

4 Steve 2 2018

5 John 7 2019

Amit

Player Rank Year

0 Amit 1 2023

2 Amit 3 2021

David

Player Rank Year

3 David 5 2022

John

Player Rank Year

1 John 4 2022

5 John 7 2019

Steve

Player Rank Year

4 Steve 2 2018

View the Group

Use the groups property in Python Pandas to view the group. Let us see an example:

import pandas as pd

# Our Dataset

data = {

'Player': ["Amit", "John", "Amit", "David", "Steve", "John"],

'Rank': [1, 4, 3, 5, 2, 7],

'Year': [2023, 2022, 2021, 2022, 2018, 2019]

}

# Our DataFrame

df = pd.DataFrame(data)

# Display the records

print("Player Records\n\n", df)

# Group by Player and Display

print(df.groupby('Player').groups)

Output

Player Records

Player Rank Year

0 Amit 1 2023

1 John 4 2022

2 Amit 3 2021

3 David 5 2022

4 Steve 2 2018

5 John 7 2019

{'Amit': [0, 2], 'David': [3], 'John': [1, 5], 'Steve': [4]}

Aggregation Operation on Groups

After grouping, we can perform operations on the grouped data using the agg() method. Through this method, get the mean or even get the size of each group, etc. Let’s see some examples:

Get the mean of the grouped data
Get the size of each group

Get the mean of the grouped data

To get the mean of the grouped data, first, group and then use the agg() method with numpy.mean(). Let us see an example:

import pandas as pd

import numpy as np

# Our Dataset

data = {

'Player': ["Amit", "John", "Amit", "David", "Steve", "John"],

'Rank': [1, 4, 3, 5, 2, 7],

'Points': [95, 70, 65, 80, 90, 50],

'Year': [2023, 2022, 2021, 2022, 2023, 2019]

}

# Our DataFrame

df = pd.DataFrame(data)

# Display the records

print("Player Records\n\n", df)

# Use the groupby() to group

groupRes = df.groupby('Year')

# The agg() is used to perform aggregation

print("\n",groupRes['Points'].agg(np.mean))

Output

Player Records

Player Rank Points Year

0 Amit 1 95 2023

1 John 4 70 2022

2 Amit 3 65 2021

3 David 5 80 2022

4 Steve 2 90 2023

5 John 7 50 2019

Year

2019 50.0

2021 65.0

2022 75.0

2023 92.5

Name: Points, dtype: float64

Get the size of each group

To get the size of each group, use the Numpy size attribute in Pandas. We have grouped by the Player column using the groupby(). Let us see an example:

import pandas as pd

import numpy as np

# Our Dataset

data = {

'Player': ["Amit", "John", "Amit", "David", "Steve", "John"],

'Rank': [1, 4, 3, 5, 2, 7],

'Points': [95, 70, 65, 80, 90, 50],

'Year': [2023, 2022, 2021, 2022, 2023, 2019]

}

# Our DataFrame

df = pd.DataFrame(data)

# Display the records

print("Player Records\n\n", df)

# Use the groupby() to group

groupRes = df.groupby('Player')

# The agg() is used to perform aggregation

# The numpy.size attribute returns the size of each group

print("\n",groupRes.agg(np.size))

Output

Player Records

Player Rank Points Year

0 Amit 1 95 2023

1 John 4 70 2022

2 Amit 3 65 2021

3 David 5 80 2022

4 Steve 2 90 2023

5 John 7 50 2019

Rank Points Year

Player

Amit 2 2 2

David 1 1 1

John 2 2 2

Steve 1 1 1

If you liked the tutorial, spread the word and share the link and our website Studyopedia with others:

For Videos, Join Our YouTube Channel: Join Now

Read More:

Print page

1 Like

Studyopedia Editorial Staff

contact@studyopedia.com

We work to create programming tutorials for all.

24 Jan Pandas – Group the Data

Pandas – Split the object and combine the result

Iterate the Group

View the Group

Aggregation Operation on Groups

Get the mean of the grouped data

Get the size of each group

Studyopedia Editorial Staff

No Comments

Post A Comment

Tutorials

Cheat Sheet

Quiz

Interview Questions & Answers