[BUG] Extremely poor performance with slow readFile and slow writeFile on very small files
See original GitHub issue🐛 Bug Report
Many files are extremely slow to read and write, despite being very small, eg less than 100kb. For example a file with 2 sheets, columns A - L, 200 rows.
This occurs when reading/writing on the API side, ie using nodejs
Lib version: 4.3.0. Node v10 on a Docker image. Was also occurring on exceljs 3.x
Steps To Reproduce
const filePath = path.join(__dirname, 'myfile.xlsx')
let workbook = new Excel.Workbook()
workbook = await workbook.xlsx.readFile(filePath)
// takes 10 seconds
const outFilePath = path.join(__dirname, 'myfile_updated.xslx')
await workbook.xlsx.writeFile(outFilePath)
// takes 45 seconds to one minute
Logs output:
[02:30:31 5170] START readFile ....
[02:30:41 1680] ... DONE readFile
[02:30:41 1700] START writeFile ....
[02:31:24 9620] ... DONE writeFile
[02:31:24 9630] Set headers
[02:31:24 9630] Send file
Also, occasionally it crashes with an out of memory error when writing the file, which indicates that the slow performance is due to excessive memory use. This is a 67kb file. It should not cause an out of memory error.
==== JS stack trace =========================================
0: ExitFrame [pc: 0x22878ba5bf1d]
Security context: 0x2c572fb9e6c1 <JSObject>
1: /* anonymous */ [0x3333573c1149] [/var/www/api/node_modules/exceljs/lib/xlsx/xform/sheet/data-validations-xform.js:~36] [pc=0x22878ba61947](this=0x3cc2c678d461 <JSGlobal Object>,dataValidation=0x0cfda8d0ac19 <Object map = 0x369bde2c94d9>,address=0x168ed0a4c5a1 <String[7]: K486175>)
2: /* anonymous */ [0x3333573c11c1] [/var/www/api/node_modules/exc...
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
Hopefully the stacktrace helps to find where this is occurring.
I have tried the patchDefinedNames
fix suggested in other issues however it does not work because DefinedNames.prototype
is undefined:
(function patchDefinedNames() {
console.log('START patchDefinedNames', DefinedNames, DefinedNames.prototype)
try {
const desc = Object.getOwnPropertyDescriptor(DefinedNames.prototype, "model");
console.log('START defineProperty', desc)
if (desc) {
Object.defineProperty(DefinedNames.prototype, "model", {
get: desc.get,
set: noop
});
}
console.log('DONE defineProperty')
} catch (error) {
console.log('ERROR in patchDefinedNames', error)
}
console.log('DONE patchDefinedNames')
})();
Log output from the fix is:
START patchDefinedNames { default: [Function: DefinedNames] } undefined }
ERROR in patchDefinedNames TypeError: Cannot convert undefined or null to object
at Object.getOwnPropertyDescriptor (<anonymous>)
at getOwnPropertyDescriptor (/var/www/api/node_modules/core-js/library/fn/object/get-own-property-descriptor.js:4:18)
at Timeout.patchDefinedNames [as _onTimeout] (/var/www/api/dist/controllers/approval/index.js:71:18)
at ontimeout (timers.js:436:11)
at tryOnTimeout (timers.js:300:5)
at listOnTimeout (timers.js:263:5)
at Timer.processTimers (timers.js:223:10)
DONE patchDefinedNames
This error occurs because DefinedNames.prototype
is undefined. So this suggested fix does not work as at v4.3.0
Note that there are many other reported issues regarding read/write performance, and it does not look like it is being fixed. Instead it looks like this is a serious and ongoing issue.
https://github.com/exceljs/exceljs/issues/535, https://github.com/exceljs/exceljs/issues/1494, https://github.com/exceljs/exceljs/issues/1609, https://github.com/exceljs/exceljs/issues/1017, https://github.com/exceljs/exceljs/issues/1597, https://github.com/exceljs/exceljs/issues/847, https://github.com/exceljs/exceljs/issues/742, https://github.com/exceljs/exceljs/issues/566, https://github.com/exceljs/exceljs/issues/535, https://github.com/exceljs/exceljs/issues/527, https://github.com/exceljs/exceljs/issues/282, https://github.com/exceljs/exceljs/issues/125,
The expected behaviour:
File should be read in sub 1 second time. File should write in 1 or 2 seconds time. Obviously file operations have some overhead.
If this not fixable then sadly we will have to stop using exceljs
Issue Analytics
- State:
- Created 2 years ago
- Reactions:8
- Comments:5 (1 by maintainers)
Top GitHub Comments
I just ran into the same thing with a weird (but non-malicious) spreadsheet that got uploaded to a service I’m working on.
Somehow it ended up containing this crazy data validation:
… which then makes exceljs loop over 17,798,898 cells and set a hash key for each(!)
I came up with this super hacky workaround to turn
Range#forEachAddress
into a noop:It would be nice to avoid that. In my application I don’t really need these data validations to be applied, so it would be great if they could be turned off. Alternatively we could use the existing
maxCols
/maxRows
to intersect the ranges with the supported data area.I was running into a problem like this on a 49kb file where it wouldn’t finish loading in even after literally days. I did a deep dive into this and this is what I found.
When loading in a file with data validations applied to ranges of cells, it enumerates the address of each individual cell which the validation applies to and stores that (see https://github.com/exceljs/exceljs/blob/master/lib/xlsx/xform/sheet/data-validations-xform.js#L218). This goes relatively quickly (a little less than a millisecond per address on my machine) until it hits 8,388,607 total cell addresses across all validations, at which point it immediately slows WAY down (~3 seconds per address on my machine). As it happened my small spreadsheet had a lot of data validations applied to large ranges (well past what actually existed in the spreadsheet yet).
Here’s the odd thing though,
exceljs
completely supports data validations by ranges (see https://github.com/exceljs/exceljs/issues/1109#issuecomment-610339362). I ended up pulling a really screwy workaround where I pre-parsed the spreadsheet usingJSZip
andxmldom
to remove data validations from the sheet xml files, loaded the spreadsheet withexceljs
, and then added the data validations back in by their original ranges and that worked. It seems like it should be possible to change the loading logic to just use the ranges though and skip the cell address enumeration.