question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

External materializations memory alloc issue

See original GitHub issue

So not exactly sure what is going on here, the error coming back from dbt is not super clear. It looks duckdb is running out of memory.

The error is:

/usr/local/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 2 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

More context Before external materializations, I was running my dbt project and then copying the data out to specific folders with an on-run-end macro. This allowed for me to run somewhere in the neighborhood of 100,000 simulations in my project (found over here) on a VM with 8GB of RAM without seeing this issue. The default number of simulations in the project is typically 10k to keep it fast.

However, when using external materializations, I have to reduce the run size to 1k in order for it to run successfully. This leads me to believe that there is a memory “leak” inside of DuckDB. If I were to speculate, DuckDB is holding both the external tables and the duckdb tables in memory, instead of dropping the duckdb tables once the file has been exported to its external storage location.

Issue Analytics

  • State:closed
  • Created 10 months ago
  • Comments:29 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
matsonjcommented, Nov 29, 2022

@jwills I think I’m fine with closing this, given there are 4 workarounds. 1) set the max memory with PRAGMA. 2) use a bigger VM. 3) materialize the problematic tables as tables, and export with a post hook. 4) don’t use external tables and instead use the duckdb database directly.

1reaction
tomsejcommented, Nov 14, 2022

Ok, I also tried to install duckdb==0.5.2.dev2286 and memory is way better:

image

But I still do not understand why 16Mb Parquet file need this amount of RAM 😕

Read more comments on GitHub >

github_iconTop Results From Across the Web

OOM when reading Parquet file · Issue #3969 · duckdb ...
It is using all available memory and is terminated by OOM. To Reproduce. Allocate a machine with 32 GB RAM, like c6a.4xlarge on...
Read more >
6 Tips to avoid HANA Out of Memory (OOM) Errors - SAP Blogs
HANA will unload partition-columns of tables on a Least Recently Used basis, when it is out of memory.
Read more >
A Non-blocking Buddy System for Scalable Memory Allocation ...
In standard libraries or in an Operating System (OS), memory allocation is de-facto a shared-data management problem. In fact, allocators deal with the...
Read more >
8.12.3.1 How MySQL Uses Memory
The following list describes some of the ways that MySQL uses memory. ... memory for the entire buffer pool at server startup, using...
Read more >
SSDAlloc: Hybrid SSD/RAM Memory Management Made Easy
SSDAlloc moves the SSD upward in the memory hier- ... dressing these issues. ... requested for materialization is not present in the RAM....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found