Introduction
Python has a wealthy ecosystem of libraries that make it a great language for information evaluation. A type of libraries is pandas
, which simplifies the method of studying and writing information between in-memory information buildings and totally different file codecs.
Nonetheless, whereas working with Excel information utilizing pandas.read_excel
, you would possibly run into an error that appears like this:
xlrd.biffh.XLRDError: Excel xlsx file; not supported
On this Byte, we’ll dissect this error message, perceive why it happens, and learn to repair it.
What’s the Error “xlrd.biffh.XLRDError”
The xlrd.biffh.XLRDError
is a particular error message that you simply would possibly encounter whereas working with the pandas
library in Python. This error is thrown once you attempt to learn an Excel file with the .xlsx
extension utilizing pandas.read_excel
methodology.
This is an instance of the error:
import pandas as pd
df = pd.read_excel('file.xlsx')
Output:
xlrd.biffh.XLRDError: Excel xlsx file; not supported
Reason for the Error
The xlrd.biffh.XLRDError
error is brought on by a current change within the xlrd
library that pandas
makes use of to learn Excel information. The xlrd
library now solely helps the older .xls
file format and not helps the newer .xlsx
file format.
This variation is usually a little bit of a shock when you’ve been utilizing pandas.read_excel
with xlrd
. By default, pandas.read_excel
makes use of the xlrd
library to learn Excel information, however as of xlrd
model 2.0.0, this library not helps .xlsx
information.
As builders, we have all been there…
How you can Repair the Error
The answer to this error is straightforward. You simply want to put in openpyxl
and specify the engine
argument within the pandas.read_excel
methodology to make use of the openpyxl
library as an alternative of xlrd
. The openpyxl
library helps each .xls
and .xlsx
file codecs.
This is tips on how to do it:
First, it’s essential set up the openpyxl
library. You are able to do this utilizing pip:
$ pip set up openpyxl
Then, you may specify the engine
argument within the pandas.read_excel
methodology like this:
import pandas as pd
df = pd.read_excel('file.xlsx', engine='openpyxl')
This code will learn the Excel file utilizing the openpyxl
library, and you’ll not encounter the xlrd.biffh.XLRDError
error.
Conclusion
On this Byte, we have realized in regards to the xlrd.biffh.XLRDError
error that occurs when utilizing pandas.read_excel
to learn .xlsx
information. We have realized why this error happens and tips on how to repair it by utilizing the openpyxl
library.