Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Memory leak issue

See original GitHub issue

Hi @SheetJSDev ,

First of all, thanks to the SheetJS team for your great work, this lib is a real time saver!

Although, I’m facing issues when dealing with big amount of data.

Context

I’m working for a telecom company, and they have a webapp allowing their customers to download some of the data about their SIM card’s consumption, Devices etc… So basically some of their data need to be exportable in CSV/XLS/XLSX.

All the data is retrieved through HTTP calls on a REST API. To avoid any issue with the amount of data retrieved we decided to retrieve data recursively. Each call retrieve 500 rows of data.

js-xlsx: 0.12.11

Issue

We’ve got the issue with one of the company’s customer whom wanted to export more than 40k rows of data. I reproduced the bug and I was watching the JS Heap Size Performance Monitor, I saw that there was no issues until I called the lib to build the CSV/XLS/XLSX file. Even though I’m retrieving data recursively I was calling the lib only when all the data was retrieved in a local basic array.

Simplified code:

import { write, utils, readFile, read, writeFile, WritingOptions, WorkBook, WorkSheet } from 'xlsx';

localeData: any[] = [];

initDownload() {   
    this.getDataRecursively(0);
}

getDataRecursively(
    pageIndex: number,
    totalCount?: number) {  // totalCount being the number of total rows to retrieve.
      const shouldContinue: boolean = true | false; // condition to stop recursion, dummy version
     if (shouldContinue) {
          dataGetter(pageIndex)
                 .subscribe(response: any[]) {
                      this.localeData = [...this.localeData, ...response.body];
                      this.getDataRecursively(pageIndex++, response.headers['totalCount'];
                  }
     } else {
         this.endDownload();
     }
}

endDownload() {
     const workbook = utils.book_new();
     const worksheet = utils. json_to_sheet(this.localData);
     utils.book_append_sheet(workbook, worksheet);

     writeFile(workbook, 'test.xls');
}

!!! This is a simplified version of the original code !!!

So my first thought was to fill the worksheet every time I’m retrieving 500 rows instead of using a local array. And the effect was positive, since the JS Heap Size was higher at every HTTP call and way more lower at the end since the worksheet was already filled.

import { write, utils, readFile, read, writeFile, WritingOptions, WorkBook, WorkSheet } from 'xlsx';

initDownload() {
     const worksheet = utils.json_to_sheet([]);
        
    this.getDataRecursively(0, worksheet);
}

getDataRecursively(
    pageIndex: number,
    worksheet: WorkSheet
    totalCount?: number) {  // totalCount being the number of total rows to retrieve.
      const shouldContinue: boolean = true | false; // condition to stop recursion, dummy version
     if (shouldContinue) {
          dataGetter(pageIndex)
                 .subscribe(response: any[]) {
                      utils.sheet_add_json(worksheet, response.body);
                      this.getDataRecursively(pageIndex++, worksheet, response.headers['totalCount'];
                  }
     } else {
         this.endDownload(worksheet);
     }
}

endDownload(worksheet: WorkSheet) {
     const workbook = utils.book_new();
    utils.book_append_sheet(workbook, worksheet);

     writeFile(workbook, 'test.xls');
}

But since everything is computed on the client side, the performance depends also of the machine of the client. And unfortunately the bug still occurs for some of our customers and it comes directly from the writeFile function. The CPU is at 100% usage and the JS Heap Size goes up to 2go of RAM.

I saw several issues with this kind of problem but I didn’t find any respond that could fix my issue.

Hope it’s clear enough, please ask me more infromations if needed.

Cheers from France 🇫🇷 ,

@rhanb

Issue Analytics

State:
Created 5 years ago
Reactions:4
Comments:7 (3 by maintainers)

Top GitHub Comments

1reaction

SheetJSDevcommented, Mar 8, 2022

@rhanb @sibelius please test again against the latest version and see if you can reproduce the issue. There have been Chrome improvements when dealing with large objects.

@wafs please follow https://github.com/SheetJS/sheetjs/issues/77 for more details

0reactions

SheetJSDevcommented, Sep 10, 2021

@wafs https://bugs.chromium.org/p/v8/issues/detail?id=3175 , a bug affecting NodeJS and Chrome, is why arrays are used to build up strings. 7 years later the performance landscape has improved so we’d have to go back and see which approach is better.