[BUG] date parsing doesn't work in streaming mode
See original GitHub issue🐛 Bug Report
Dates are not being parsed correctly when using streaming - issue seems to be the parser internally not getting the “style” for the field.
Lib version: 4.1.1 (latest)
Steps To Reproduce
const through = require('through2')
const { pipeline, Readable } = require('readable-stream')
const reader = new Excel.stream.xlsx.WorkbookReader(fs.createReadStream('file.xlsx'))
const createReader = async function* () {
for await (const worksheet of reader) {
for await (const row of worksheet) {
yield row
}
}
}
const out = pipeline(
Readable.from(createReader()),
through.obj(function (row, _, cb) {
console.log(row.values)
cb(null, row)
}),
(err) => {
if (err) out.emit('error', err)
}
)
You’ll see the second column always comes back as a number, even though it should be an ISO string. This file is parsed perfectly fine and gives the correct results when not using streaming mode.
Attached the excel file, you can see the second column is properly date formatted.
Possible solution
I debugged into the library, it seems like the issue is that this is always returning null and not finding the style, which means that later on when it checks the number format to see if it is a date it returns false and just treats it as a number. In the case of this excel sheet, c.s
is 2
- so it could be that s
is not being parsed right, because it looks like 2
in defaultnumformats.js
= general number. I’ve tried changing to every other date format in excel and re-saving the file and it still comes back as 2 every time.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:3
- Comments:11
Top GitHub Comments
I had this issue as well, and it was due to me ignoring styles. The streaming reader has a second options parameter
WorkbookStreamReaderOptions
. The docs says it defaults to cache, but according to the type definition it defaults to ignore. https://github.com/exceljs/exceljs#streaming-xlsx-readercontentsQuick update: There is a race condition with the streaming parser, depending on the ordering of the files in the excel zip it will start parsing the worksheet before the styles have been parsed and dates will just come in as numbers.
Specifically I’ve fixed it with this change: https://github.com/contra/exceljs/commit/457b0b35d54b5eea4c72d3c9fb53318e817083a0
I’ve run into a handful of race conditions, stalls and other issues with the streaming parser - once I have everything working perfectly I’ll send a PR.