To clean the data in Python, we have some built-in functions. In this lesson, we will understand them one by one with examples. Cleaning the data in Pandas means working on the incorrect data to fix it. This incorrect data can empty data, null, duplicate data, etc.
Before moving further, we’ve prepared a video tutorial to clean the data with Pandas:
Let’s say we have the following CSV file demo.csv. The data consists of some null values:
Let us now work around the functions to clean the data:
isnull(): Find the NULL values and replace them with True.
notnull(): Find the NOT NULL values and replace them with True.
df.dropna(): Drop rows with NULL values.
df.fillna(x): Replace NULL values with a specific value
Pandas isnull() method
The isnull() method in Pandas is used to find the NULL values and replace them with True. For non-NULL values, False is returned. Let us see an example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
importpandas aspd
# Input CSV file
df=pd.read_csv(r"C:\Users\hp\Desktop\demo.csv")
# Display the CSV file records
print("Our DataFrame\n",df)
# Find and Replace Null with True
resdf=df.isnull()
# Return the new DataFrame
print("\nNew DataFrame \n",resdf.to_string())
Output
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Our DataFrame
Frequency Points
02.483.5
13.221.6
26.1NaN
31.245.9
42.919.3
53.823.9
64.5NaN
78.366.3
87.974.7
95.867.5
NewDataFrame
Frequency Points
0FalseFalse
1FalseFalse
2FalseTrue
3FalseFalse
4FalseFalse
5FalseFalse
6FalseTrue
7FalseFalse
8FalseFalse
9FalseFalse
Pandas notnull() method
The notnull() method in Pandas is used to find the NOT NULL values and replace them with True. For NULL values, False is returned. Let us see an example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
importpandas aspd
# Input CSV file
df=pd.read_csv(r"C:\Users\hp\Desktop\demo.csv")
# Display the CSV file records
print("Our DataFrame\n",df)
# Find and Replace NOT NULL values with True
resdf=df.notnull()
# Return the new DataFrame
print("\nNew DataFrame\n",resdf.to_string())
Output
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Our DataFrame
Frequency Points
02.483.5
13.221.6
26.1NaN
31.245.9
42.919.3
53.823.9
64.5NaN
78.366.3
87.974.7
95.867.5
NewDataFrame
Frequency Points
0TrueTrue
1TrueTrue
2TrueFalse
3TrueTrue
4TrueTrue
5TrueTrue
6TrueFalse
7TrueTrue
8TrueTrue
9TrueTrue
Pandas dropna() method
The dropna() method in Pandas is used to drop and remove rows with null values. Let us see an example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
importpandas aspd
# Input CSV file
df=pd.read_csv(r"C:\Users\hp\Desktop\demo.csv")
# Display the CSV file records
print("Our DataFrame\n",df)
# Find and remove rows with NULL value
resdf=df.dropna()
# Return the new DataFrame
print("\nNew DataFrame (after removing rows with NULL)\n",resdf.to_string())
Output
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Our DataFrame
Frequency Points
02.483.5
13.221.6
26.1NaN
31.245.9
42.919.3
53.823.9
64.5NaN
78.366.3
87.974.7
95.867.5
NewDataFrame(after removing rows with NULL)
Frequency Points
02.483.5
13.221.6
31.245.9
42.919.3
53.823.9
78.366.3
87.974.7
95.867.5
Pandas fillna() method
The fillna() method in Pandas is used to replace NULL values with a specific value. Let us see an example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
importpandas aspd
# Input CSV file
df=pd.read_csv(r"C:\Users\hp\Desktop\demo.csv")
# Display the CSV file records
print("Our DataFrame\n",df)
# Find and replace NULL values with a specific value 111
resdf=df.fillna(111)
# Return the new DataFrame
print("\nNew DataFrame (after replacing NULL with a specific value)\n",resdf.to_string())
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.Ok
No Comments