Thursday, April 11, 2024

Counting Non-NaN Values in DataFrame Columns

Must read


Introduction

Information cleansing is a crucial step in any information science challenge. In Python, Pandas DataFrame is a generally used information construction for information manipulation and evaluation.

On this Byte, we are going to give attention to dealing with non-NaN (Not a Quantity) values in DataFrame columns. We are going to discover ways to depend and calculate whole non-NaN values, and likewise deal with empty strings as NA values.

Counting Non-NaN Values in DataFrame Columns

Pandas offers the depend() operate to depend the non-NaN values in DataFrame columns. Let’s begin by importing the pandas library and making a easy DataFrame.

import pandas as pd
import numpy as np

information = {'Title': ['Tom', 'Nick', 'John', np.nan],
        'Age': [20, 21, 19, np.nan]}

df = pd.DataFrame(information)

print(df)

Output:

   Title   Age
0   Tom  20.0
1  Nick  21.0
2  John  19.0
3   NaN   NaN

Now, we will depend the non-NaN values in every column utilizing the depend() technique:

print(df.depend())

Output:

Title    3
Age     3
dtype: int64

Calculating Complete Non-NaN Values in DataFrame

If you wish to get the entire variety of non-NaN values within the DataFrame, you should utilize the depend() operate mixed with sum().

print(df.depend().sum())

Output:

6

This means that there are a complete of 6 non-NaN values within the DataFrame.

Treating Empty Strings as NA Values

In some circumstances, you may need to deal with empty strings as NA values. You need to use the change() operate to switch empty strings with np.nan.

information = {'Title': ['Tom', 'Nick', '', 'John'],
        'Age': [20, 21, '', 19]}

df = pd.DataFrame(information)

print(df)

Output:

   Title Age
0   Tom  20
1  Nick  21
2        
3  John  19

Now, change the empty strings with np.nan:

df.change('', np.nan, inplace=True)

print(df)

Output:

   Title  Age
0   Tom  20.0
1  Nick  21.0
2   NaN   NaN
3  John  19.0

Word: This operation adjustments the DataFrame in-place. If you wish to preserve the unique DataFrame intact, do not use the inplace=True argument.

Utilizing notna() to Rely Non-Lacking Values

A barely extra direct strategy to filter and depend non-NaN values is with the notna() technique.

Let’s begin with a easy DataFrame:

import pandas as pd

information = {'Title': ['John', 'Anna', None, 'Mike', 'Sarah'],
        'Age': [28, None, None, 32, 29],
        'Metropolis': ['New York', 'Los Angeles', None, 'Chicago', 'Boston']}

df = pd.DataFrame(information)

print(df)

This can output:

   Title   Age         Metropolis
0  John  28.0     New York
1  Anna   NaN  Los Angeles
2  None   NaN         None
3  Mike  32.0      Chicago
4 Sarah  29.0       Boston

You possibly can see that our DataFrame has some lacking values (NaN or None).

Now, if you wish to depend the non-missing values within the ‘Title’ column, you should utilize notna():

print(df['Name'].notna().sum())

This can output:

4

The notna() operate returns a Boolean Collection the place True represents a non-missing worth and False represents a lacking worth. The sum() operate is then used to depend the variety of True values, which symbolize the non-missing values.

Conclusion

On this Byte, we have realized how you can depend non-NaN values in DataFrame columns. Dealing with lacking information is a crucial step in information preprocessing. The notna() operate, amongst different capabilities in Pandas, offers an easy strategy to depend non-missing values in DataFrame columns.



Supply hyperlink

More articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest article