question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Extremely poor performance with slow readFile and slow writeFile on very small files

See original GitHub issue

🐛 Bug Report

Many files are extremely slow to read and write, despite being very small, eg less than 100kb. For example a file with 2 sheets, columns A - L, 200 rows.

This occurs when reading/writing on the API side, ie using nodejs

Lib version: 4.3.0. Node v10 on a Docker image. Was also occurring on exceljs 3.x

Steps To Reproduce

const filePath = path.join(__dirname, 'myfile.xlsx')
let workbook = new Excel.Workbook()
workbook = await workbook.xlsx.readFile(filePath)
// takes 10 seconds
      
const outFilePath = path.join(__dirname, 'myfile_updated.xslx')
await workbook.xlsx.writeFile(outFilePath)
// takes 45 seconds to one minute

Logs output:

[02:30:31 5170]  START readFile .... 
[02:30:41 1680]  ... DONE readFile
[02:30:41 1700]  START writeFile ....
[02:31:24 9620]   ... DONE writeFile
[02:31:24 9630]  Set headers
[02:31:24 9630]  Send file

Also, occasionally it crashes with an out of memory error when writing the file, which indicates that the slow performance is due to excessive memory use. This is a 67kb file. It should not cause an out of memory error.

==== JS stack trace =========================================
 0: ExitFrame [pc: 0x22878ba5bf1d]
 Security context: 0x2c572fb9e6c1 <JSObject>
    1: /* anonymous */ [0x3333573c1149] [/var/www/api/node_modules/exceljs/lib/xlsx/xform/sheet/data-validations-xform.js:~36] [pc=0x22878ba61947](this=0x3cc2c678d461 <JSGlobal Object>,dataValidation=0x0cfda8d0ac19 <Object map = 0x369bde2c94d9>,address=0x168ed0a4c5a1 <String[7]: K486175>)
    2: /* anonymous */ [0x3333573c11c1] [/var/www/api/node_modules/exc...

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

Hopefully the stacktrace helps to find where this is occurring.

I have tried the patchDefinedNames fix suggested in other issues however it does not work because DefinedNames.prototype is undefined:

(function patchDefinedNames() {
  console.log('START patchDefinedNames', DefinedNames, DefinedNames.prototype)
    try {
      const desc = Object.getOwnPropertyDescriptor(DefinedNames.prototype, "model");
      console.log('START defineProperty', desc)
      if (desc) {
        Object.defineProperty(DefinedNames.prototype, "model", {
          get: desc.get,
          set: noop
        });
      }
      console.log('DONE defineProperty')
    } catch (error) {
      console.log('ERROR in patchDefinedNames', error)
    }
  console.log('DONE patchDefinedNames')
})();

Log output from the fix is:

START patchDefinedNames { default: [Function: DefinedNames] } undefined }
ERROR in patchDefinedNames TypeError: Cannot convert undefined or null to object
    at Object.getOwnPropertyDescriptor (<anonymous>)
    at getOwnPropertyDescriptor (/var/www/api/node_modules/core-js/library/fn/object/get-own-property-descriptor.js:4:18)
    at Timeout.patchDefinedNames [as _onTimeout] (/var/www/api/dist/controllers/approval/index.js:71:18)
    at ontimeout (timers.js:436:11)
    at tryOnTimeout (timers.js:300:5)
    at listOnTimeout (timers.js:263:5)
    at Timer.processTimers (timers.js:223:10)
DONE patchDefinedNames

This error occurs because DefinedNames.prototype is undefined. So this suggested fix does not work as at v4.3.0

Note that there are many other reported issues regarding read/write performance, and it does not look like it is being fixed. Instead it looks like this is a serious and ongoing issue.

https://github.com/exceljs/exceljs/issues/535, https://github.com/exceljs/exceljs/issues/1494, https://github.com/exceljs/exceljs/issues/1609, https://github.com/exceljs/exceljs/issues/1017, https://github.com/exceljs/exceljs/issues/1597, https://github.com/exceljs/exceljs/issues/847, https://github.com/exceljs/exceljs/issues/742, https://github.com/exceljs/exceljs/issues/566, https://github.com/exceljs/exceljs/issues/535, https://github.com/exceljs/exceljs/issues/527, https://github.com/exceljs/exceljs/issues/282, https://github.com/exceljs/exceljs/issues/125,

The expected behaviour:

File should be read in sub 1 second time. File should write in 1 or 2 seconds time. Obviously file operations have some overhead.

If this not fixable then sadly we will have to stop using exceljs

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:8
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
papandreoucommented, May 27, 2022

I just ran into the same thing with a weird (but non-malicious) spreadsheet that got uploaded to a service I’m working on.

Somehow it ended up containing this crazy data validation:

<dataValidation type="list" allowBlank="1" showInputMessage="1" showErrorMessage="1" sqref="F1583:F1048576 G1583:I1048576 Q1583:Q1048576 U1583:AB1048576 S1583:S1048576 L1583:L1048576 AD1583:AE1048576"><formula1>#REF!</formula1></dataValidation>

… which then makes exceljs loop over 17,798,898 cells and set a hash key for each(!)

I came up with this super hacky workaround to turn Range#forEachAddress into a noop:

require('exceljs/lib/doc/range').prototype.forEachAddress = function () {};

It would be nice to avoid that. In my application I don’t really need these data validations to be applied, so it would be great if they could be turned off. Alternatively we could use the existing maxCols / maxRows to intersect the ranges with the supported data area.

0reactions
ryegye24commented, May 16, 2022

I was running into a problem like this on a 49kb file where it wouldn’t finish loading in even after literally days. I did a deep dive into this and this is what I found.

When loading in a file with data validations applied to ranges of cells, it enumerates the address of each individual cell which the validation applies to and stores that (see https://github.com/exceljs/exceljs/blob/master/lib/xlsx/xform/sheet/data-validations-xform.js#L218). This goes relatively quickly (a little less than a millisecond per address on my machine) until it hits 8,388,607 total cell addresses across all validations, at which point it immediately slows WAY down (~3 seconds per address on my machine). As it happened my small spreadsheet had a lot of data validations applied to large ranges (well past what actually existed in the spreadsheet yet).

Here’s the odd thing though, exceljs completely supports data validations by ranges (see https://github.com/exceljs/exceljs/issues/1109#issuecomment-610339362). I ended up pulling a really screwy workaround where I pre-parsed the spreadsheet using JSZip and xmldom to remove data validations from the sheet xml files, loaded the spreadsheet with exceljs, and then added the data validations back in by their original ranges and that worked. It seems like it should be possible to change the loading logic to just use the ranges though and skip the cell address enumeration.

Read more comments on GitHub >

github_iconTop Results From Across the Web

System stops responding, slow file server performance, or ...
Provides a solution to performance issues that occur when you work with files that are located on a file server.
Read more >
Improve write speed for high speed file copy? - Stack Overflow
Hence, i've written a C/C++ program that uses CreateFile, ReadFile & WriteFile API's with NO_BUFFERING & WRITE_THROUGH flags. The program simulates ESEUTIL.exe, ...
Read more >
JDK-6265734 (fc) Single FileChannel slower than ... - Bug ID
EVALUATION The issue we have with ReadFile/WriteFile changing the global file position still remains but if we change the positionLock to be a...
Read more >
Two new ways to read a file quickly - LWN.net
A successful readfile() would read the contents of the indicated file into buf, up to a maximum of bufsize bytes, returning the number...
Read more >
Don't defer Close() on writable files - joe shaw
If every Write() call committed the data to the disk synchronously, the performance of our systems would be unusably slow. While synchronous ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found