question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: "with pd.ExcelWriter" produces a corrupt Excel file in case of .xlsm extension

See original GitHub issue

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import pandas as pd

excel_path = r"D:\my_test.xlsm"
with pd.ExcelWriter(excel_path, mode='a') as writer:
    # df.to_excel(writer, sheet_name='Sheet1')
    pass

df_excel = pd.read_excel(excel_path, 'Sheet1')

Issue Description

The specified Excel file gets corrupted, despite the append mode. Changing its extension to .xlsx can retrieve the raw data from it, but the macro is forever gone.

In my example I used with and had only a pass statement inside it, yet the file became useless.

Excel_macro_corruption.zip

Q1: Can we know for sure that the default engine is used for .xlsm files? Q2: Should I contact OpenPyxl instead of you with this issue?

Expected Behavior

As I’ve read since my first encounter of this bug, ExcelWriter doesn’t support macros, but a warning of some kind would be nice before wiping out a carefully written VBA code from the Excel file.

Installed Versions

INSTALLED VERSIONS

commit : 945c9ed766a61c7d2c0a7cbb251b6edebf9cb7d5 python : 3.7.4.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19041 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 9, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : None.None pandas : 1.3.4 numpy : 1.17.3 pytz : 2021.3 dateutil : 2.8.2 pip : 19.3.1 setuptools : 40.8.0 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : None numexpr : None odfpy : None openpyxl : 3.0.9 pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : None

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:25 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
AlejoB13Ccommented, Jan 21, 2022

What I do to create a valid .xlsm file directly in python is also with openpyxl I do something like

from openpyxl import Workbook, load_workbook

path = './book.xlsm'
wb = Workbook()
wb.save(path)

wb = load_workbook(path, keep_vba=True)
wb.save(path)
wb.close()

The first part with Workbook() creates the file, but it is corrupted, however, when you open it with load_workbook and the keep_vba=True arg, the file is magically fixed and ready to be manipulated.

1reaction
AlejoB13Ccommented, Jan 15, 2022

The most I could do with this was to modify pandas.io.excel._openpyxl.py.OpenpyxlWriter.__init__.book (line 65) to directly add "keep_vba = True" since this line does not include any motor_kwargs. And from what I tried this doesn’t affect .xlsx files

However to append data to existing sheet, you should do something like:

path = './file.xlsm'
sheet_name = 'sheet_name'
adf = pd.read_excel(path, sheet_name=sheet_name, engine='openpyxl')
wb = load_workbook(path, keep_vba=True)

with pd.ExcelWriter(path, engine = 'openpyxl') as writer:
    writer.book = wb
    writer.sheets = {n: wb[n] for n in wb.sheetnames}
    df.to_excel(writer, sheet_name = sheetname, index=False, header=False, startrow=len(adf)+1)

or modify again _openpyxl.py to add something like an "append" option to def write_cells()

Read more comments on GitHub >

github_iconTop Results From Across the Web

File corruption while writing using Pandas - Stack Overflow
I am reading data from a perfectly valid xlsx file and processing it using Pandas in Python 3.5. At ...
Read more >
Excel file got "corrupted" / "tampered" on macbook
I got an important password-protected excel file that Excel suddenly refuses to open, citing the file being "corrupted" or "tampered".
Read more >
Known Issues and Bugs — XlsxWriter Documentation
This section lists known issues and bugs and gives some information on how to submit bug reports. “Content is Unreadable. Open and Repair”....
Read more >
Reading and writing an xlsm file results in corrupted file
I'm trying to read an existing xlsm file and write it out. ... openpyxl import Workbook from openpyxl import load_workbook # Open excel...
Read more >
Pandas ExcelWriter Corrupts .xlsm file - Kaggle
If you trust the source of this workbook, click yes." warning. How can i fix this problem? Thanks. import pandas as pd df_TRYprod...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found