question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: ExcelWriter: file is corrupted on save (and: does it accept a file object?)

See original GitHub issue
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

first.py:

import pandas as pd
import openpyxl

df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})

writer1 = pd.ExcelWriter('first.xlsx', engine='openpyxl')
df.to_excel(writer1)
writer1.save()
writer1.close()

second.py:

import pandas as pd
import openpyxl

df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})

output2 = open('second.xlsx', 'wb')
writer2 = pd.ExcelWriter(output2, engine='openpyxl')
df.to_excel(writer2)
writer2.save()
writer2.close()
output2.flush()
output2.close()

UPD 2:

writer2.save()
writer2.close()

Remove one line. It will start to work!

UPD: third.py (see https://github.com/pandas-dev/pandas/issues/33746#issuecomment-640169769 by @lordgrenville):

import pandas as pd
import openpyxl

df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})

with open('third.xlsx', 'wb') as output3:
    writer3 = pd.ExcelWriter(output3, engine='openpyxl')
    df.to_excel(writer3)
    writer3.save()

Problem description

$ python first.py
$ python second.py
$ du -b first.xlsx second.xlsx third.xlsx
4737	first.xlsx
9474	second.xlsx
4737	third.xlsx

~(Note: 9474 = 2 * 4737. But sometimes it’s not true.)~

  1. Why files differ?
  2. Documentation says: path - str - Path to xls or xlsx file. So does pd.ExcelWriter.__init__ accept a file-like object?

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None python : 3.7.7.final.0 python-bits : 64 OS : Linux OS-release : 5.4.32-calculate machine : x86_64 processor : Intel® Core™ i5-7200U CPU @ 2.50GHz byteorder : little LC_ALL : None LANG : ru_RU.utf8 LOCALE : ru_RU.UTF-8

pandas : 1.0.3 numpy : 1.18.1 pytz : 2019.3 dateutil : 2.8.1 pip : 20.0.2 setuptools : 46.1.3.post20200330 Cython : 0.29.15 pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : 1.2.8 lxml.etree : 4.5.0 html5lib : None pymysql : None psycopg2 : None jinja2 : 2.11.1 IPython : 7.13.0 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : 4.5.0 matplotlib : 3.1.3 numexpr : None odfpy : None openpyxl : 3.0.3 pandas_gbq : None pyarrow : None pytables : None pytest : None pyxlsb : None s3fs : None scipy : 1.4.1 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None xlsxwriter : 1.2.8 numba : None

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

2reactions
wwwaldcommented, May 18, 2021

I’m running into the same problem, or at least it seems very similar.

In my case, I’m trying to write a dataframe to an existing worksheet, using ExcelWriter’s append mode. When opening the resulting file excel-results.xlsx, Excel (Office 365) warns me that it is corrupt and offers to repair. The repair does work, but of course, it shouldn’t be necessary.

Some code to reproduce the problem:

import pandas as pd
from pathlib import Path
import shutil
from openpyxl import load_workbook

xlsx_template = Path("excel-template.xlsx")
xlsx_results = Path("excel-results.xlsx")
shutil.copy2(xlsx_template, xlsx_results)

df = pd.DataFrame(
    {"type": ["ERROR", "NOTFOUND", "ERROR"], 
     "message": ["First error message", "Didn't find a value", "Another error"]}
)

with pd.ExcelWriter(xlsx_results, engine="openpyxl", mode="a") as writer:
    writer.book = load_workbook(xlsx_results)
    writer.sheets = {ws.title: ws for ws in writer.book.worksheets}
    df.to_excel(writer, sheet_name="Error messages", startrow=7, startcol=1, index=False, header=False)
    writer.save()

The excel-template.xlsx file used here is this one.

Versions used to reproduce this:

INSTALLED VERSIONS  
------------------  
commit           : 2cb96529396d93b46abab7bbc73a208e708c642e  
python           : 3.7.10.final.0  
python-bits      : 64  
OS               : Windows  
OS-release       : 10  
Version          : 10.0.17763  
machine          : AMD64  
processor        : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel  
byteorder        : little  
LC_ALL           : None  
LANG             : en_US.UTF-8  
LOCALE           : None.None  
  
pandas           : 1.2.4  
numpy            : 1.20.2  
pytz             : 2021.1  
dateutil         : 2.8.1  
pip              : 21.1.1  
setuptools       : 49.6.0.post20210108  
Cython           : None  
pytest           : None  
hypothesis       : None  
sphinx           : None  
blosc            : None  
feather          : None  
xlsxwriter       : None  
lxml.etree       : None  
html5lib         : None  
pymysql          : None  
psycopg2         : None  
jinja2           : None  
IPython          : None  
pandas_datareader: None  
bs4              : None  
bottleneck       : None  
fsspec           : None  
fastparquet      : None  
gcsfs            : None  
matplotlib       : None  
numexpr          : None  
odfpy            : None  
openpyxl         : 3.0.7  
pandas_gbq       : None  
pyarrow          : None  
pyxlsb           : None  
s3fs             : None  
scipy            : None  
sqlalchemy       : None  
tables           : None  
tabulate         : None  
xarray           : None  
xlrd             : None  
xlwt             : None  
numba            : None  
1reaction
kuragacommented, Jun 9, 2020
writer2.save()
writer2.close()

Remove one line. It will start to work!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pandas ExcelWriter Openpyxl is creating a corrupt file that has ...
I was facing the same problem nowadays. For me the solution was to put the writer into "with" and not using .save and...
Read more >
Excel file getting corrupted when saving from dataframe. - Reddit
Try this, delete the old corrupt excel file if you havent, then comment out "writer.save()" and run the script again. in the source...
Read more >
Pandas ExcelWriter Corrupts .xlsm file - Kaggle
Hi all,. Using pandas in python, I want to add sheets on existing .xlsm file with the code below. However, after running process...
Read more >
Repairing a corrupted workbook - Microsoft Support
Excel cannot always start File Recovery mode automatically. If you cannot open a workbook because it has been corrupted, you can try to...
Read more >
[Solved] Excel Cannot Open the File Because the Extension Is ...
The error may occur if you're using an unsupported Excel file format or the file is corrupt. If the Excel file is severely...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found