18 Apr Filtering in Pandas for Data Manipulation
Filtering in pandas is a core data manipulation technique that lets you refine datasets by retaining or removing rows based on specific conditions. You can filter rows where a column contains certain strings or integers, and invert the logic using the ~ operator to exclude matches instead of including them. This flexibility makes filtering powerful for cleaning, shaping, and preparing data for analysis, ensuring you work only with the rows relevant to your task.
In this lesson, we will see the following examples to filter data in Pandas:
- Keep rows by condition (filtering)
- Keep rows where a column contains a specific string
- Remove rows where a column contains a specific string. Use the ~ tilde operator to negate the condition
- Remove rows where a column contains a specific integer value. Use the ~ tilde operator to negate the condition
- Keep rows where a column contains a specific integer value. Use the ~ tilde operator to negate the condition
Keep rows by condition (filtering)
Let us see an example to create a sample DataFrame with student IDs, marks, and names:
- It applies a filtering condition to keep only rows where marks are greater than 75.
- Finally, it prints the updated DataFrame showing only the qualifying students.
Here is the example:
# Keep rows by condition (filtering)
import pandas as pd
df = pd.DataFrame({'ID': [1, 2, 3, 4],
'Marks': [50, 69, 79, 98],
'Name': ["John", "Jacob", "Tim", "Shaun"]})
print(df)
# Keep rows where marks is greater than 75
df = df[df['Marks'] > 75]
print("\nUpdated DataFrame\n",df)
Output
ID Marks Name
0 1 50 John
1 2 69 Jacob
2 3 79 Tim
3 4 98 Shaun
Updated DataFrame
ID Marks Name
2 3 79 Tim
3 4 98 Shaun
Keep rows where a column contains a specific string
Let us see an example to create a DataFrame with IDs, marks, and names:
- It filters rows where the Name column contains the string “Shaun”.
- The updated DataFrame displays only Shaun’s record.
Here is the example:
# Keep rows where a column contains a specific string
import pandas as pd
df = pd.DataFrame({'ID': [1, 2, 3, 4],
'Marks': [50, 69, 79, 98],
'Name': ["John", "Jacob", "Tim", "Shaun"]})
print(df)
# Keep rows where a column contains a specific string
df = df[df['Name'].str.contains('Shaun')]
print("\nUpdated DataFrame\n",df)
Output
ID Marks Name
0 1 50 John
1 2 69 Jacob
2 3 79 Tim
3 4 98 Shaun
Updated DataFrame
ID Marks Name
3 4 98 Shaun
Remove rows where a column contains a specific string
Let us see an example to create a DataFrame with IDs, marks, and names.
- It uses the tilde (~) operator to negate the condition, removing rows where Name contains “Shaun”.
- The updated DataFrame prints all records except Shaun’s.
Here is the example:
# Remove rows where a column contains a specific string
# Use the ~ tilde operator to negate the condition
import pandas as pd
df = pd.DataFrame({'ID': [1, 2, 3, 4],
'Marks': [50, 69, 79, 98],
'Name': ["John", "Jacob", "Tim", "Shaun"]})
print(df)
# Remove rows where a column contains a specific string
df = df[~df['Name'].str.contains('Shaun')]
print("\nUpdated DataFrame\n",df)
Output
ID Marks Name
0 1 50 John
1 2 69 Jacob
2 3 79 Tim
3 4 98 Shaun
Updated DataFrame
ID Marks Name
0 1 50 John
1 2 69 Jacob
2 3 79 Tim
Remove rows where a column contains a specific integer value
Let us see an example to create a DataFrame with IDs, marks, and names:
- It uses the tilde (~) operator to negate the condition, removing rows where Marks equals 98.
- The updated DataFrame prints all records except the one with marks equal to 98.
Here is the example:
# Remove rows where a column contains a specific integer value
# Use the ~ tilde operator to negate the condition
import pandas as pd
df = pd.DataFrame({'ID': [1, 2, 3, 4],
'Marks': [50, 69, 79, 98],
'Name': ["John", "Jacob", "Tim", "Shaun"]})
print(df)
# Remove rows where a column contains a specific integer value
df2 = df[~(df['Marks']==98)]
print("\nUpdated DataFrame\n",df2)
Output
ID Marks Name
0 1 50 John
1 2 69 Jacob
2 3 79 Tim
3 4 98 Shaun
Updated DataFrame
ID Marks Name
0 1 50 John
1 2 69 Jacob
2 3 79 Tim
Keep rows where a column contains a specific integer value
Let us see an example to initialize a DataFrame with IDs, marks, and names:
- It applies a condition to keep only rows where Marks equals 98.
- The updated DataFrame prints just the record with marks equal to 98.
Here is the example:
# Keep rows where a column contains a specific integer value
# Use the ~ tilde operator to negate the condition
import pandas as pd
df = pd.DataFrame({'ID': [1, 2, 3, 4],
'Marks': [50, 69, 79, 98],
'Name': ["John", "Jacob", "Tim", "Shaun"]})
print(df)
# Keep rows where a column contains a specific integer value
df2 = df[df['Marks']==98]
print("\nUpdated DataFrame\n",df2)
Output
ID Marks Name
0 1 50 John
1 2 69 Jacob
2 3 79 Tim
3 4 98 Shaun
Updated DataFrame
ID Marks Name
3 4 98 Shaun
If you liked the tutorial, spread the word and share the link and our website, Studyopedia, with others:
For Videos, Join Our YouTube Channel: Join Now
Read More:
No Comments