excessive CPU time spent on gc (even after manually adjusting gc thresholds)
See original GitHub issueDuring the process I’m running I very quickly get this warning (for pretty much each worker)
distributed.utils_perf - WARNING - full garbage collections took 36% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 34% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 35% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 34% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 35% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 35% CPU time recently (threshold: 10%)
However, the percent of memory used is by each worker is low -
I have plenty of memory to work with so my question is - is it possible dask is aggressively doing gc? If so is it possible to change the gc threshold/collect less aggressively?
I’ve tried to manually adjust the gc.threshold
but this seems to have no effect
g0, g1, g2 = gc.get_threshold()
gc.set_threshold(g0*10, g1*10, g2*10)
(using distributed 1.28.1 and dask 1.2.2)
Issue Analytics
- State:
- Created 4 years ago
- Comments:16 (5 by maintainers)
Top Results From Across the Web
Garbage Collector taking too much CPU Time - Stack Overflow
My Application takes 20 GB max memory. A full GC occur after every 14 minutes. Memory Before GC is 20 GB and after...
Read more >Garbage collector config settings - .NET - Microsoft Learn
Learn about run-time settings for configuring how the garbage collector manages memory for .NET Core apps.
Read more >Optimizing garbage collection in a high load .NET service
We've yet to figure out the reason why this GC is called. RecycleLimitMonitor monitors IIS memory use (specifically, the Private Bytes number), ...
Read more >10 Garbage-First Garbage Collector Tuning
Observing Full Garbage Collections ... A full heap garbage collection (Full GC) is often very time consuming. Full GCs caused by too high...
Read more >A Guide to the Go Garbage Collector
How this works internally is the GC sets an upper limit on the amount of CPU time it can use over some time...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi, just to add, I’m getting the same errors while repartitioning a parquet file. I suspect that it has more to do with the computation being very quick and the garbage collection just being the more burdensome task. Maybe the solution is to add to only warn if GC takes a long fraction of a long task? It seems to be the idea behind the first comment in _gc_callback in distributed.utils_perf which emits this warning:
For reference, my code is:
Hi,
I am starting with dask on my laptop, going through the dask-tutorial which is on github, but I have similar issue with “distributed.utils_perf - WARNING - full garbage collections took xxx% CPU time recently (threshold: 10%)” always when I start local cluster with the distributed scheduler
Is there any progress on this pls, or any way to solve this?