22 Dec Find and Remove Duplicates from rows in Pandas
To find and remove duplicates from rows in a Pandas DataFrame or Series, use the duplicated() and drop_duplicates() method respectively.
Find Duplicates
To find duplicates from rows in a Pandas DataFrame or Series, use the duplicated() method. It returns a Series with True and False values i.e. for duplicate rows True is returned.
Let us see an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
import pandas as pd # Dataset data = { 'student': ["Amit", "John", "Amit", "David", "Steve"], 'rank': [1, 4, 1, 5, 3], 'marks': [95, 70, 95, 60, 90] } df = pd.DataFrame(data) print("Student Records\n\n", df) # Find duplicates res = df.duplicated() print("\nDescribing Duplicates:\n",res) |
Output
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
Student Records student rank marks 0 Amit 1 95 1 John 4 70 2 Amit 1 95 3 David 5 60 4 Steve 3 90 Describing Duplicates: 0 False 1 False 2 True 3 False 4 False |
Remove Duplicates
To remove duplicates from rows in a Pandas DataFrame or Series, use the drop_duplicates() method. Let us see an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
import pandas as pd # Dataset data = { 'student': ["Amit", "John", "Amit", "David", "Steve"], 'rank': [1, 4, 1, 5, 3], 'marks': [95, 70, 95, 60, 90] } df = pd.DataFrame(data) print("Student Records\n\n", df) # Delete duplicates using the drop_duplicates() res = df.drop_duplicates() print("\nNew DataFrame after deleting duplicates:\n",res) |
Output
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
Student Records student rank marks 0 Amit 1 95 1 John 4 70 2 Amit 1 95 3 David 5 60 4 Steve 3 90 New DataFrame: student rank marks 0 Amit 1 95 1 John 4 70 3 David 5 60 4 Steve 3 90 |
If you liked the tutorial, spread the word and share the link and our website Studyopedia with others.
For Videos, Join Our YouTube Channel: Join Now
Read More:
No Comments