question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Memory leak issue

See original GitHub issue

Hi @SheetJSDev ,

First of all, thanks to the SheetJS team for your great work, this lib is a real time saver!

Although, I’m facing issues when dealing with big amount of data.

Context

I’m working for a telecom company, and they have a webapp allowing their customers to download some of the data about their SIM card’s consumption, Devices etc… So basically some of their data need to be exportable in CSV/XLS/XLSX.

All the data is retrieved through HTTP calls on a REST API. To avoid any issue with the amount of data retrieved we decided to retrieve data recursively. Each call retrieve 500 rows of data.

js-xlsx: 0.12.11

Issue

We’ve got the issue with one of the company’s customer whom wanted to export more than 40k rows of data. I reproduced the bug and I was watching the JS Heap Size Performance Monitor, I saw that there was no issues until I called the lib to build the CSV/XLS/XLSX file. Even though I’m retrieving data recursively I was calling the lib only when all the data was retrieved in a local basic array.

Simplified code:

import { write, utils, readFile, read, writeFile, WritingOptions, WorkBook, WorkSheet } from 'xlsx';

localeData: any[] = [];

initDownload() {   
    this.getDataRecursively(0);
}

getDataRecursively(
    pageIndex: number,
    totalCount?: number) {  // totalCount being the number of total rows to retrieve.
      const shouldContinue: boolean = true | false; // condition to stop recursion, dummy version
     if (shouldContinue) {
          dataGetter(pageIndex)
                 .subscribe(response: any[]) {
                      this.localeData = [...this.localeData, ...response.body];
                      this.getDataRecursively(pageIndex++, response.headers['totalCount'];
                  }
     } else {
         this.endDownload();
     }
}

endDownload() {
     const workbook = utils.book_new();
     const worksheet = utils. json_to_sheet(this.localData);
     utils.book_append_sheet(workbook, worksheet);

     writeFile(workbook, 'test.xls');
}

!!! This is a simplified version of the original code !!!

So my first thought was to fill the worksheet every time I’m retrieving 500 rows instead of using a local array. And the effect was positive, since the JS Heap Size was higher at every HTTP call and way more lower at the end since the worksheet was already filled.

import { write, utils, readFile, read, writeFile, WritingOptions, WorkBook, WorkSheet } from 'xlsx';

initDownload() {
     const worksheet = utils.json_to_sheet([]);
        
    this.getDataRecursively(0, worksheet);
}

getDataRecursively(
    pageIndex: number,
    worksheet: WorkSheet
    totalCount?: number) {  // totalCount being the number of total rows to retrieve.
      const shouldContinue: boolean = true | false; // condition to stop recursion, dummy version
     if (shouldContinue) {
          dataGetter(pageIndex)
                 .subscribe(response: any[]) {
                      utils.sheet_add_json(worksheet, response.body);
                      this.getDataRecursively(pageIndex++, worksheet, response.headers['totalCount'];
                  }
     } else {
         this.endDownload(worksheet);
     }
}

endDownload(worksheet: WorkSheet) {
     const workbook = utils.book_new();
    utils.book_append_sheet(workbook, worksheet);

     writeFile(workbook, 'test.xls');
}

But since everything is computed on the client side, the performance depends also of the machine of the client. And unfortunately the bug still occurs for some of our customers and it comes directly from the writeFile function. The CPU is at 100% usage and the JS Heap Size goes up to 2go of RAM.

I saw several issues with this kind of problem but I didn’t find any respond that could fix my issue.

Hope it’s clear enough, please ask me more infromations if needed.

Cheers from France 🇫🇷 ,

@rhanb

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:4
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
SheetJSDevcommented, Mar 8, 2022

@rhanb @sibelius please test again against the latest version and see if you can reproduce the issue. There have been Chrome improvements when dealing with large objects.

@wafs please follow https://github.com/SheetJS/sheetjs/issues/77 for more details

0reactions
SheetJSDevcommented, Sep 10, 2021

@wafs https://bugs.chromium.org/p/v8/issues/detail?id=3175 , a bug affecting NodeJS and Chrome, is why arrays are used to build up strings. 7 years later the performance landscape has improved so we’d have to go back and see which approach is better.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Memory leak - Wikipedia
In computer science, a memory leak is a type of resource leak that occurs when a computer program incorrectly manages memory allocations in...
Read more >
What is Memory Leak? How can we avoid? - GeeksforGeeks
Memory leak occurs when programmers create a memory in heap and forget to delete it. The consequences of memory leak is that it...
Read more >
How do I check for memory leaks, and what should I do to stop ...
The system can have a myriad of symptoms that point to a leak, though: decreased performance, a slowdown plus the inability to open...
Read more >
Memory Leaks and Garbage Collection | Computerworld
Memory leaks are a form of computer brain drain that can cause systems to be unstable and unpredictable. The fix is called garbage...
Read more >
What Is a Memory Leak and How Do They Happen?
A memory leak is a portion of an application that uses memory from RAM without finally freeing it. The result is that an...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found