Pandas' read_excel, ExcelFile, failing to open some .xls files.
See original GitHub issueI am trying to read in .xls
files from http://www.eia.gov/coal/data.cfm#production – specifically the Historical detailed coal production data (1983-2013) coalpublic2012.xls
file that’s freely available via the dropdown. Pandas cannot read it.
In contrast, the file for the most recent year available, 2013, coalpublic2013.xls
file, works without a problem:
import pandas as pd
df1 = pd.read_excel("coalpublic2013.xls")
but the next decade of .xls
files (2004-2012) do not load. I have looked at these files with Excel, and they open, and are not corrupted.
The error that I get from pandas is:
XLRDError Traceback (most recent call last)
<ipython-input-28-0da33766e9d2> in <module>()
----> 1 df = pd.read_excel("coalpublic2012.xlsx")
/Users/jonathan/anaconda/lib/python2.7/site-packages/pandas/io/excel.pyc in read_excel(io, sheetname, header, skiprows, skip_footer, index_col, parse_cols, parse_dates, date_parser, na_values, thousands, convert_float, has_index_names, converters, engine, **kwds)
161
162 if not isinstance(io, ExcelFile):
--> 163 io = ExcelFile(io, engine=engine)
164
165 return io._parse_excel(
/Users/jonathan/anaconda/lib/python2.7/site-packages/pandas/io/excel.pyc in __init__(self, io, **kwds)
204 self.book = xlrd.open_workbook(file_contents=data)
205 else:
--> 206 self.book = xlrd.open_workbook(io)
207 elif engine == 'xlrd' and isinstance(io, xlrd.Book):
208 self.book = io
/Users/jonathan/anaconda/lib/python2.7/site-packages/xlrd/__init__.pyc in open_workbook(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows)
433 formatting_info=formatting_info,
434 on_demand=on_demand,
--> 435 ragged_rows=ragged_rows,
436 )
437 return bk
/Users/jonathan/anaconda/lib/python2.7/site-packages/xlrd/book.pyc in open_workbook_xls(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows)
89 t1 = time.clock()
90 bk.load_time_stage_1 = t1 - t0
---> 91 biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
92 if not biff_version:
93 raise XLRDError("Can't determine file's BIFF version")
/Users/jonathan/anaconda/lib/python2.7/site-packages/xlrd/book.pyc in getbof(self, rqd_stream)
1228 bof_error('Expected BOF record; met end of file')
1229 if opcode not in bofcodes:
-> 1230 bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
1231 length = self.get2bytes()
1232 if length == MY_EOF:
/Users/jonathan/anaconda/lib/python2.7/site-packages/xlrd/book.pyc in bof_error(msg)
1222 if DEBUG: print("reqd: 0x%04x" % rqd_stream, file=self.logfile)
1223 def bof_error(msg):
-> 1224 raise XLRDError('Unsupported format, or corrupt file: ' + msg)
1225 savpos = self._position
1226 opcode = self.get2bytes()
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '<?xml ve'```
And I have tried various other things:
```df = pd.ExcelFile("coalpublic2012.xls", encoding_override='cp1252')
import xlrd
wb = xlrd.open_workbook("coalpublic2012.xls")
to no avail. My pandas version: 0.17.0
Issue Analytics
- State:
- Created 8 years ago
- Comments:27 (9 by maintainers)
Top Results From Across the Web
Pandas cannot open an Excel (.xlsx) file - Stack Overflow
I had the same problem using the ExcelFile constructor (for a file containing multiple worksheets) instead of the read_excel method.
Read more >How To Fix Error Pandas Cannot Open An Excel xlsx File
How To Fix Error Pandas Cannot Open An Excel xlsx File. In [1]:. import pandas as pd. Let us try opening a XLSX...
Read more >Pandas cannot open an Excel xlsx file | Edureka Community
Please see my code below: import pandas df = pandas.read_excel('cat.xlsx'). After running that, it gives me the following error:
Read more >Pandas read_excel removed support for xlsx files
This causes you to receive the error that the xlsx filetype is no longer supported when calling the read_excel function on a xlsx...
Read more >How to fix xlrd error xlsx file not supported in Excel using Pandas
xlrd error xlsx file not supported in excel is the most common error that comes in the way of opening .xlsx files with...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hi, I had faced the same
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record;
error and solved it by writing an XML to XLSX converter.@jbwhit I have run the following code:
This reads the file successfully without giving any error. But, it gives all the data in the exact format as mentioned. So, you may have to do extra efforts in order to process the data after reading it successfully.