question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

excessive CPU time spent on gc (even after manually adjusting gc thresholds)

See original GitHub issue

During the process I’m running I very quickly get this warning (for pretty much each worker)

distributed.utils_perf - WARNING - full garbage collections took 36% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 34% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 35% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 34% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 35% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 35% CPU time recently (threshold: 10%)

However, the percent of memory used is by each worker is low -

image

I have plenty of memory to work with so my question is - is it possible dask is aggressively doing gc? If so is it possible to change the gc threshold/collect less aggressively?

I’ve tried to manually adjust the gc.threshold but this seems to have no effect

g0, g1, g2 = gc.get_threshold()
gc.set_threshold(g0*10, g1*10, g2*10)

(using distributed 1.28.1 and dask 1.2.2)

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:16 (5 by maintainers)

github_iconTop GitHub Comments

4reactions
benjaminvatterjcommented, Jan 12, 2021

Hi, just to add, I’m getting the same errors while repartitioning a parquet file. I suspect that it has more to do with the computation being very quick and the garbage collection just being the more burdensome task. Maybe the solution is to add to only warn if GC takes a long fraction of a long task? It seems to be the idea behind the first comment in _gc_callback in distributed.utils_perf which emits this warning:

    def _gc_callback(self, phase, info):
        # Young generations are small and collected very often,
        # don't waste time measuring them
        if info["generation"] != 2:
            return

For reference, my code is:

from dask.distributed import Client
client = Client()
# outputs: <Client: 'tcp://127.0.0.1:45451' processes=7 threads=28, memory=48.32 GB>
import dask.dataframe as dataframe
df = dataframe.read_parquet('data_simulated_partitioned.parquet')
df.npartitions
# 3941
df = df.repartition(partition_size='100MB')
# This is were hundreds of warnings arise
df.npartitions
# 137
df = df.persist()
# Again, hundreds of warnings

4reactions
jklencommented, Dec 28, 2020

Hi,

I am starting with dask on my laptop, going through the dask-tutorial which is on github, but I have similar issue with “distributed.utils_perf - WARNING - full garbage collections took xxx% CPU time recently (threshold: 10%)” always when I start local cluster with the distributed scheduler

  • It is occurring always when I start local cluster even when I submit simple calcualtions
  • different settings of parameters when starting the cluster, like n_workers, memory_limit, threads_per_worker, processes does not seem to have an effect
  • I use conda environment with python 3.7, on laptop with 4 cores, 8 logical processors and 16GB RAM
  • I tried using it on windows and an another machine with linux , no difference
  • the % cpu time in the warning is slowly climbing up and I got this warning almost every second
  • client.restart() does not have effect on this, these warnings show up again immediately
  • it is occurring only with the distributed scheduler
  • after these warnings start to show up and then after shutting down the cluster with client.shutdown() and then starting it again without restarting the ipython kernel, these warnings start to show up without submitting any calculation

Is there any progress on this pls, or any way to solve this?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Garbage Collector taking too much CPU Time - Stack Overflow
My Application takes 20 GB max memory. A full GC occur after every 14 minutes. Memory Before GC is 20 GB and after...
Read more >
Garbage collector config settings - .NET - Microsoft Learn
Learn about run-time settings for configuring how the garbage collector manages memory for .NET Core apps.
Read more >
Optimizing garbage collection in a high load .NET service
We've yet to figure out the reason why this GC is called. RecycleLimitMonitor monitors IIS memory use (specifically, the Private Bytes number), ...
Read more >
10 Garbage-First Garbage Collector Tuning
Observing Full Garbage Collections ... A full heap garbage collection (Full GC) is often very time consuming. Full GCs caused by too high...
Read more >
A Guide to the Go Garbage Collector
How this works internally is the GC sets an upper limit on the amount of CPU time it can use over some time...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found