question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Issue with damaged XLSX created with cell content not truncated to 32767 chars

See original GitHub issue

Hi, We are using XlsxWriter in ScanCcde.io to craft XLSX outputs . See below for links.

Sometimes a long text may be collected that may contain CRLF and be longer than the max length of a cell (e.g. 32767). Yet the string is not truncated by XlsxWriter and MS Excel on Windows reports the workbook as damaged. If I replace the CRLF by LF, then the XlsxWriter truncation takes place “as usual”.

I am using Python 3.6 or 3.9 on Linux with XlsxWriter latest version 1.4.3

$ python --version
Python 3.9.0
$ python -c 'import xlsxwriter; print(xlsxwriter.__version__)'
1.4.3

Here is some code that demonstrates the problem:


import shutil
import tempfile
import xml.etree.ElementTree as ET
from pathlib import Path

import xlsxwriter

def test_workbook_with_long_text():
    """
    Create a workbook with a worksheet with a cell with ``original_text``
    and then extract, and read and to compute the length.
    """
    test_dir = Path(tempfile.mkdtemp())
    print("temp test_dir:", test_dir)

    long_text = "a\r\n" * 32 * 1024

    output_file = test_dir / "foobar.xlsx"
    with xlsxwriter.Workbook(str(output_file)) as workbook:
        worksheet = workbook.add_worksheet("baz")
        worksheet.write_row(row=0, col=0, data=[long_text])

    extract_dir = test_dir / "extracted"
    shutil.unpack_archive(
        filename=output_file,
        extract_dir=extract_dir,
        format="zip",
    )

    # This XML doc contains the strings stored in cells and has this shape:
    # <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    # <sst     xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
    #      count="2" uniqueCount="2">
    #   <si><t>foo</t></si>
    #   <si><t>f0123456789</t></si>
    # </sst>

    shared_strings = extract_dir / "xl" / "sharedStrings.xml"
    print("XLSX shared_strings file:", shared_strings)
    sstet = ET.parse(str(shared_strings))
    # here the text we care is the last element of the XML
    texts = list(e.text for e in sstet.getroot().iter())
    print("length text", len(texts[-1]))

if __name__ == "__main__":
    test_workbook_with_long_text()

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:9 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
jmcnamaracommented, Jun 19, 2021

Thanks for the detailed report.

0reactions
dr-ftvkuncommented, Nov 28, 2022

@dr-ftvkun The limit is definitely 32767 characters. That is documented by Microsoft and can be verified in Excel. However, I think in practice the limit may be 32767x2 bytes for 2-byte UTF-8 characters and any 3 or 4 byte characters reduce the effective character limit.

In your case did you have emoji in the string or any 4 byte UTF-8 characters?

yes, you are absolutely right, the string does contain emojis. Thank you for priceless info!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Is there a way to force automatic truncation of characters in ...
I know that data validation can set a character limit on cells so users can't enter in cells past that, but what I...
Read more >
Failure to open project XLSX in MSFT Excel on Windows
It is because the package description exceeds 32767 characters length which is a limit in Excel. XlsxWriter should truncate these but does not...
Read more >
Does pandas .to_csv export cells properly (with no data ...
the data in cells at 32,767 characters for an xlsx, or for a csv export, it puts information in to the next rows...
Read more >
RESOLVED - Excel: text in cell being truncated after save
I found a limitation on the number of characters in the header row of tables - it is limited to 255 characters and...
Read more >
The value of the column was truncated because its length ...
Symptoms After exporting metadata to excel, you see this error message: The value of the column 'text' was truncated because its length...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found