Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: ExcelWriter: file is corrupted on save (and: does it accept a file object?)

See original GitHub issue

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

first.py:

import pandas as pd
import openpyxl

df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})

writer1 = pd.ExcelWriter('first.xlsx', engine='openpyxl')
df.to_excel(writer1)
writer1.save()
writer1.close()

second.py:

import pandas as pd
import openpyxl

df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})

output2 = open('second.xlsx', 'wb')
writer2 = pd.ExcelWriter(output2, engine='openpyxl')
df.to_excel(writer2)
writer2.save()
writer2.close()
output2.flush()
output2.close()

UPD 2:

writer2.save()
writer2.close()

Remove one line. It will start to work!

UPD: third.py (see https://github.com/pandas-dev/pandas/issues/33746#issuecomment-640169769 by @lordgrenville):

import pandas as pd
import openpyxl

df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})

with open('third.xlsx', 'wb') as output3:
    writer3 = pd.ExcelWriter(output3, engine='openpyxl')
    df.to_excel(writer3)
    writer3.save()

Problem description

$ python first.py
$ python second.py
$ du -b first.xlsx second.xlsx third.xlsx
4737	first.xlsx
9474	second.xlsx
4737	third.xlsx

~(Note: 9474 = 2 * 4737. But sometimes it’s not true.)~

Why files differ?
Documentation says: path - str - Path to xls or xlsx file. So does pd.ExcelWriter.__init__ accept a file-like object?

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None python : 3.7.7.final.0 python-bits : 64 OS : Linux OS-release : 5.4.32-calculate machine : x86_64 processor : Intel® Core™ i5-7200U CPU @ 2.50GHz byteorder : little LC_ALL : None LANG : ru_RU.utf8 LOCALE : ru_RU.UTF-8

pandas : 1.0.3 numpy : 1.18.1 pytz : 2019.3 dateutil : 2.8.1 pip : 20.0.2 setuptools : 46.1.3.post20200330 Cython : 0.29.15 pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : 1.2.8 lxml.etree : 4.5.0 html5lib : None pymysql : None psycopg2 : None jinja2 : 2.11.1 IPython : 7.13.0 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : 4.5.0 matplotlib : 3.1.3 numexpr : None odfpy : None openpyxl : 3.0.3 pandas_gbq : None pyarrow : None pytables : None pytest : None pyxlsb : None s3fs : None scipy : 1.4.1 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None xlsxwriter : 1.2.8 numba : None

Issue Analytics

State:
Created 3 years ago
Comments:5 (1 by maintainers)

Top GitHub Comments

2reactions

wwwaldcommented, May 18, 2021

I’m running into the same problem, or at least it seems very similar.

In my case, I’m trying to write a dataframe to an existing worksheet, using ExcelWriter’s append mode. When opening the resulting file excel-results.xlsx, Excel (Office 365) warns me that it is corrupt and offers to repair. The repair does work, but of course, it shouldn’t be necessary.

Some code to reproduce the problem:

import pandas as pd
from pathlib import Path
import shutil
from openpyxl import load_workbook

xlsx_template = Path("excel-template.xlsx")
xlsx_results = Path("excel-results.xlsx")
shutil.copy2(xlsx_template, xlsx_results)

df = pd.DataFrame(
    {"type": ["ERROR", "NOTFOUND", "ERROR"], 
     "message": ["First error message", "Didn't find a value", "Another error"]}
)

with pd.ExcelWriter(xlsx_results, engine="openpyxl", mode="a") as writer:
    writer.book = load_workbook(xlsx_results)
    writer.sheets = {ws.title: ws for ws in writer.book.worksheets}
    df.to_excel(writer, sheet_name="Error messages", startrow=7, startcol=1, index=False, header=False)
    writer.save()

The excel-template.xlsx file used here is this one.

Versions used to reproduce this:

INSTALLED VERSIONS  
------------------  
commit           : 2cb96529396d93b46abab7bbc73a208e708c642e  
python           : 3.7.10.final.0  
python-bits      : 64  
OS               : Windows  
OS-release       : 10  
Version          : 10.0.17763  
machine          : AMD64  
processor        : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel  
byteorder        : little  
LC_ALL           : None  
LANG             : en_US.UTF-8  
LOCALE           : None.None  
  
pandas           : 1.2.4  
numpy            : 1.20.2  
pytz             : 2021.1  
dateutil         : 2.8.1  
pip              : 21.1.1  
setuptools       : 49.6.0.post20210108  
Cython           : None  
pytest           : None  
hypothesis       : None  
sphinx           : None  
blosc            : None  
feather          : None  
xlsxwriter       : None  
lxml.etree       : None  
html5lib         : None  
pymysql          : None  
psycopg2         : None  
jinja2           : None  
IPython          : None  
pandas_datareader: None  
bs4              : None  
bottleneck       : None  
fsspec           : None  
fastparquet      : None  
gcsfs            : None  
matplotlib       : None  
numexpr          : None  
odfpy            : None  
openpyxl         : 3.0.7  
pandas_gbq       : None  
pyarrow          : None  
pyxlsb           : None  
s3fs             : None  
scipy            : None  
sqlalchemy       : None  
tables           : None  
tabulate         : None  
xarray           : None  
xlrd             : None  
xlwt             : None  
numba            : None

1reaction

kuragacommented, Jun 9, 2020