22 Dec Find and Remove Duplicates from rows in Pandas
To find and remove duplicates from rows in a Pandas DataFrame or Series, use the duplicated() and drop_duplicates() methods respectively.
Before moving further, we’ve prepared a video tutorial to find and remove duplicates from rows in Pandas:
Find Duplicates
To find duplicates from rows in a Pandas DataFrame or Series, use the duplicated() method. It returns a Series with True and False values i.e. for duplicate rows True is returned.
Let us see an example:
import pandas as pd
# Dataset
data = {
'student': ["Amit", "John", "Amit", "David", "Steve"],
'rank': [1, 4, 1, 5, 3],
'marks': [95, 70, 95, 60, 90]
}
df = pd.DataFrame(data)
print("Student Records\n\n", df)
# Find duplicates
res = df.duplicated()
print("\nDescribing Duplicates:\n",res)
Output
Student Records student rank marks 0 Amit 1 95 1 John 4 70 2 Amit 1 95 3 David 5 60 4 Steve 3 90 Describing Duplicates: 0 False 1 False 2 True 3 False 4 False
Remove Duplicates
To remove duplicates from rows in a Pandas DataFrame or Series, use the drop_duplicates() method. Let us see an example:
import pandas as pd
# Dataset
data = {
'student': ["Amit", "John", "Amit", "David", "Steve"],
'rank': [1, 4, 1, 5, 3],
'marks': [95, 70, 95, 60, 90]
}
df = pd.DataFrame(data)
print("Student Records\n\n", df)
# Delete duplicates using the drop_duplicates()
res = df.drop_duplicates()
print("\nNew DataFrame after deleting duplicates:\n",res)
Output
Student Records student rank marks 0 Amit 1 95 1 John 4 70 2 Amit 1 95 3 David 5 60 4 Steve 3 90 New DataFrame: student rank marks 0 Amit 1 95 1 John 4 70 3 David 5 60 4 Steve 3 90
If you liked the tutorial, spread the word and share the link and our website Studyopedia with others.
For Videos, Join Our YouTube Channel: Join Now
Read More:
No Comments