Filtering in Pandas for Data Manipulation

Filtering in pandas is a core data manipulation technique that lets you refine datasets by retaining or removing rows based on specific conditions. You can filter rows where a column contains certain strings or integers, and invert the logic using the ~ operator to exclude matches instead of including them. This flexibility makes filtering powerful for cleaning, shaping, and preparing data for analysis, ensuring you work only with the rows relevant to your task.

In this lesson, we will see the following examples to filter data in Pandas:

  1. Keep rows by condition (filtering)
  2. Keep rows where a column contains a specific string
  3. Remove rows where a column contains a specific string. Use the ~ tilde operator to negate the condition
  4. Remove rows where a column contains a specific integer value. Use the ~ tilde operator to negate the condition
  5. Keep rows where a column contains a specific integer value. Use the ~ tilde operator to negate the condition

Keep rows by condition (filtering)

Let us see an example to create a sample DataFrame with student IDs, marks, and names:

  • It applies a filtering condition to keep only rows where marks are greater than 75.
  • Finally, it prints the updated DataFrame showing only the qualifying students.

Here is the example:

# Keep rows by condition (filtering)

import pandas as pd

df = pd.DataFrame({'ID': [1, 2, 3, 4],
                  'Marks': [50, 69, 79, 98],
                   'Name': ["John", "Jacob", "Tim", "Shaun"]})

print(df)

# Keep rows where marks is greater than 75
df = df[df['Marks'] > 75]

print("\nUpdated DataFrame\n",df)

Output

   ID  Marks   Name
0   1     50   John
1   2     69  Jacob
2   3     79    Tim
3   4     98  Shaun

Updated DataFrame
    ID  Marks   Name
2   3     79    Tim
3   4     98  Shaun

Keep rows where a column contains a specific string

Let us see an example to create a DataFrame with IDs, marks, and names:

  • It filters rows where the Name column contains the string “Shaun”.
  • The updated DataFrame displays only Shaun’s record.

Here is the example:

# Keep rows where a column contains a specific string

import pandas as pd

df = pd.DataFrame({'ID': [1, 2, 3, 4],
                  'Marks': [50, 69, 79, 98],
                   'Name': ["John", "Jacob", "Tim", "Shaun"]})

print(df)

# Keep rows where a column contains a specific string
df = df[df['Name'].str.contains('Shaun')]

print("\nUpdated DataFrame\n",df)

Output

   ID  Marks   Name
0   1     50   John
1   2     69  Jacob
2   3     79    Tim
3   4     98  Shaun

Updated DataFrame
    ID  Marks   Name
3   4     98  Shaun

Remove rows where a column contains a specific string

Let us see an example to create a DataFrame with IDs, marks, and names.

  • It uses the tilde (~) operator to negate the condition, removing rows where Name contains “Shaun”.
  • The updated DataFrame prints all records except Shaun’s.

Here is the example:

# Remove rows where a column contains a specific string
# Use the ~ tilde operator to negate the condition

import pandas as pd

df = pd.DataFrame({'ID': [1, 2, 3, 4],
                  'Marks': [50, 69, 79, 98],
                   'Name': ["John", "Jacob", "Tim", "Shaun"]})

print(df)

# Remove rows where a column contains a specific string
df = df[~df['Name'].str.contains('Shaun')]

print("\nUpdated DataFrame\n",df)

Output

   ID  Marks   Name
0   1     50   John
1   2     69  Jacob
2   3     79    Tim
3   4     98  Shaun

Updated DataFrame
    ID  Marks   Name
0   1     50   John
1   2     69  Jacob
2   3     79    Tim

Remove rows where a column contains a specific integer value

Let us see an example to create a DataFrame with IDs, marks, and names:

  • It uses the tilde (~) operator to negate the condition, removing rows where Marks equals 98.
  • The updated DataFrame prints all records except the one with marks equal to 98.

Here is the example:

# Remove rows where a column contains a specific integer value
# Use the ~ tilde operator to negate the condition

import pandas as pd

df = pd.DataFrame({'ID': [1, 2, 3, 4],
                  'Marks': [50, 69, 79, 98],
                   'Name': ["John", "Jacob", "Tim", "Shaun"]})

print(df)

# Remove rows where a column contains a specific integer value
df2 = df[~(df['Marks']==98)]

print("\nUpdated DataFrame\n",df2)

Output

   ID  Marks   Name
0   1     50   John
1   2     69  Jacob
2   3     79    Tim
3   4     98  Shaun

Updated DataFrame
    ID  Marks   Name
0   1     50   John
1   2     69  Jacob
2   3     79    Tim

Keep rows where a column contains a specific integer value

Let us see an example to initialize a DataFrame with IDs, marks, and names:

  • It applies a condition to keep only rows where Marks equals 98.
  • The updated DataFrame prints just the record with marks equal to 98.

Here is the example:

# Keep rows where a column contains a specific integer value
# Use the ~ tilde operator to negate the condition

import pandas as pd

df = pd.DataFrame({'ID': [1, 2, 3, 4],
                  'Marks': [50, 69, 79, 98],
                   'Name': ["John", "Jacob", "Tim", "Shaun"]})

print(df)

# Keep rows where a column contains a specific integer value
df2 = df[df['Marks']==98]

print("\nUpdated DataFrame\n",df2)

Output

   ID  Marks   Name
0   1     50   John
1   2     69  Jacob
2   3     79    Tim
3   4     98  Shaun

Updated DataFrame
    ID  Marks   Name
3   4     98  Shaun

If you liked the tutorial, spread the word and share the link and our website, Studyopedia, with others:


For Videos, Join Our YouTube Channel: Join Now


Read More:

Create MultiIndex in Pandas for Data Manipulation
Studyopedia Editorial Staff
contact@studyopedia.com

We work to create programming tutorials for all.

No Comments

Post A Comment