Introduction
Information cleansing is a crucial step in any information science challenge. In Python, Pandas DataFrame is a generally used information construction for information manipulation and evaluation.
On this Byte, we are going to give attention to dealing with non-NaN
(Not a Quantity) values in DataFrame columns. We are going to discover ways to depend and calculate whole non-NaN
values, and likewise deal with empty strings as NA values.
Counting Non-NaN Values in DataFrame Columns
Pandas offers the depend()
operate to depend the non-NaN
values in DataFrame columns. Let’s begin by importing the pandas library and making a easy DataFrame.
import pandas as pd
import numpy as np
information = {'Title': ['Tom', 'Nick', 'John', np.nan],
'Age': [20, 21, 19, np.nan]}
df = pd.DataFrame(information)
print(df)
Output:
Title Age
0 Tom 20.0
1 Nick 21.0
2 John 19.0
3 NaN NaN
Now, we will depend the non-NaN
values in every column utilizing the depend()
technique:
print(df.depend())
Output:
Title 3
Age 3
dtype: int64
Calculating Complete Non-NaN Values in DataFrame
If you wish to get the entire variety of non-NaN
values within the DataFrame, you should utilize the depend()
operate mixed with sum()
.
print(df.depend().sum())
Output:
6
This means that there are a complete of 6 non-NaN
values within the DataFrame.
Treating Empty Strings as NA Values
In some circumstances, you may need to deal with empty strings as NA values. You need to use the change()
operate to switch empty strings with np.nan
.
information = {'Title': ['Tom', 'Nick', '', 'John'],
'Age': [20, 21, '', 19]}
df = pd.DataFrame(information)
print(df)
Output:
Title Age
0 Tom 20
1 Nick 21
2
3 John 19
Now, change the empty strings with np.nan
:
df.change('', np.nan, inplace=True)
print(df)
Output:
Title Age
0 Tom 20.0
1 Nick 21.0
2 NaN NaN
3 John 19.0
Word: This operation adjustments the DataFrame in-place. If you wish to preserve the unique DataFrame intact, do not use the inplace=True
argument.
Utilizing notna() to Rely Non-Lacking Values
A barely extra direct strategy to filter and depend non-NaN
values is with the notna()
technique.
Let’s begin with a easy DataFrame:
import pandas as pd
information = {'Title': ['John', 'Anna', None, 'Mike', 'Sarah'],
'Age': [28, None, None, 32, 29],
'Metropolis': ['New York', 'Los Angeles', None, 'Chicago', 'Boston']}
df = pd.DataFrame(information)
print(df)
This can output:
Title Age Metropolis
0 John 28.0 New York
1 Anna NaN Los Angeles
2 None NaN None
3 Mike 32.0 Chicago
4 Sarah 29.0 Boston
You possibly can see that our DataFrame has some lacking values (NaN
or None
).
Now, if you wish to depend the non-missing values within the ‘Title’ column, you should utilize notna()
:
print(df['Name'].notna().sum())
This can output:
4
The notna()
operate returns a Boolean Collection the place True
represents a non-missing worth and False
represents a lacking worth. The sum()
operate is then used to depend the variety of True
values, which symbolize the non-missing values.
Conclusion
On this Byte, we have realized how you can depend non-NaN
values in DataFrame columns. Dealing with lacking information is a crucial step in information preprocessing. The notna()
operate, amongst different capabilities in Pandas, offers an easy strategy to depend non-missing values in DataFrame columns.