BUG: ExcelWriter: file is corrupted on save (and: does it accept a file object?)
See original GitHub issue-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
first.py
:
import pandas as pd
import openpyxl
df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
writer1 = pd.ExcelWriter('first.xlsx', engine='openpyxl')
df.to_excel(writer1)
writer1.save()
writer1.close()
second.py
:
import pandas as pd
import openpyxl
df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
output2 = open('second.xlsx', 'wb')
writer2 = pd.ExcelWriter(output2, engine='openpyxl')
df.to_excel(writer2)
writer2.save()
writer2.close()
output2.flush()
output2.close()
UPD 2:
writer2.save()
writer2.close()
Remove one line. It will start to work!
UPD: third.py
(see https://github.com/pandas-dev/pandas/issues/33746#issuecomment-640169769 by @lordgrenville):
import pandas as pd
import openpyxl
df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
with open('third.xlsx', 'wb') as output3:
writer3 = pd.ExcelWriter(output3, engine='openpyxl')
df.to_excel(writer3)
writer3.save()
Problem description
$ python first.py
$ python second.py
$ du -b first.xlsx second.xlsx third.xlsx
4737 first.xlsx
9474 second.xlsx
4737 third.xlsx
~(Note: 9474 = 2 * 4737. But sometimes it’s not true.)~
- Why files differ?
- Documentation says:
path - str - Path to xls or xlsx file.
So doespd.ExcelWriter.__init__
accept a file-like object?
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None python : 3.7.7.final.0 python-bits : 64 OS : Linux OS-release : 5.4.32-calculate machine : x86_64 processor : Intel® Core™ i5-7200U CPU @ 2.50GHz byteorder : little LC_ALL : None LANG : ru_RU.utf8 LOCALE : ru_RU.UTF-8
pandas : 1.0.3 numpy : 1.18.1 pytz : 2019.3 dateutil : 2.8.1 pip : 20.0.2 setuptools : 46.1.3.post20200330 Cython : 0.29.15 pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : 1.2.8 lxml.etree : 4.5.0 html5lib : None pymysql : None psycopg2 : None jinja2 : 2.11.1 IPython : 7.13.0 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : 4.5.0 matplotlib : 3.1.3 numexpr : None odfpy : None openpyxl : 3.0.3 pandas_gbq : None pyarrow : None pytables : None pytest : None pyxlsb : None s3fs : None scipy : 1.4.1 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None xlsxwriter : 1.2.8 numba : None
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (1 by maintainers)
Top GitHub Comments
I’m running into the same problem, or at least it seems very similar.
In my case, I’m trying to write a dataframe to an existing worksheet, using
ExcelWriter
’s append mode. When opening the resulting fileexcel-results.xlsx
, Excel (Office 365) warns me that it is corrupt and offers to repair. The repair does work, but of course, it shouldn’t be necessary.Some code to reproduce the problem:
The
excel-template.xlsx
file used here is this one.Versions used to reproduce this:
Remove one line. It will start to work!