Memory Leaks?
See original GitHub issueHi,
I’m using elasticsearchJS to export a whole index from ES in batches of 4096. The whole tool uses about 500mb RAM while dumping ES index to parquet format. (nodeJS has 2GB memory limit set)
If i lower or increase the batch size (or randomly) it uses a lot of memory like 2-3GB and it gets killed. The quickest way to reproduce is to increase the batch size that it has to process. The generate parquet file, usually has ~5.4GB.
Is there anything i can do to debug this more?
Thanks!
P.S.: I’m using git+ssh://git@github.com/ironSource/parquetjs.git#1fa58b589d9b6451379f1558214e9ae751909596
as the parquetJS package.
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (1 by maintainers)
Top Results From Across the Web
Memory leak - Wikipedia
In computer science, a memory leak is a type of resource leak that occurs when a computer program incorrectly manages memory allocations in...
Read more >What is Memory Leak? How can we avoid? - GeeksforGeeks
Memory leak occurs when programmers create a memory in heap and forget to delete it. The consequences of memory leak is that it...
Read more >Memory Leaks and Garbage Collection | Computerworld
DEFINITION A memory leak is the gradual deterioration of system performance that occurs over time as the result of the fragmentation of a...
Read more >Java Memory Leaks: Solutions, Tools, Tutorials & More - Stackify
We put together this guide to help you understand how, why, and where Java memory leaks happen – and what you can do...
Read more >Definition of memory leak - PCMag
When memory is allocated, but not deallocated, a memory leak occurs (the memory has leaked out of the computer). If too many memory...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I think I hit this same issue. Will try to post a working sample to demonstrate, but here’s what I did:
count(*)
)Debugging through the library it seems that if the flushing happens only inside the
close
method (here: https://github.com/ironSource/parquetjs/blob/master/lib/writer.js#L108)- you get everything fine and the smallest output.But if - due to your row group size - it is triggered also in
append
(here: https://github.com/ironSource/parquetjs/blob/master/lib/writer.js#L96) then you end up with duplicate rows. For large amounts of rows that continues to build up till it blows memory.I tried the following as a quick and dirty workaround and seems to work: I changed the above lines in
writer.js
to:With this it does seem to keep the count integrity in place.
I think having the
await
before resetting the buffer (this.rowBuffer = {};
) is the issue.Does this sound right?
Regards, Arnab.
Closing this as resolved.