Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

excessive CPU time spent on gc (even after manually adjusting gc thresholds)

See original GitHub issue

During the process I’m running I very quickly get this warning (for pretty much each worker)

distributed.utils_perf - WARNING - full garbage collections took 36% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 34% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 35% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 34% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 35% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 35% CPU time recently (threshold: 10%)

However, the percent of memory used is by each worker is low -

I have plenty of memory to work with so my question is - is it possible dask is aggressively doing gc? If so is it possible to change the gc threshold/collect less aggressively?

I’ve tried to manually adjust the gc.threshold but this seems to have no effect

g0, g1, g2 = gc.get_threshold()
gc.set_threshold(g0*10, g1*10, g2*10)

(using distributed 1.28.1 and dask 1.2.2)

Issue Analytics

State:
Created 4 years ago
Comments:16 (5 by maintainers)

Top GitHub Comments

4reactions

benjaminvatterjcommented, Jan 12, 2021

Hi, just to add, I’m getting the same errors while repartitioning a parquet file. I suspect that it has more to do with the computation being very quick and the garbage collection just being the more burdensome task. Maybe the solution is to add to only warn if GC takes a long fraction of a long task? It seems to be the idea behind the first comment in _gc_callback in distributed.utils_perf which emits this warning:

    def _gc_callback(self, phase, info):
        # Young generations are small and collected very often,
        # don't waste time measuring them
        if info["generation"] != 2:
            return

For reference, my code is:

from dask.distributed import Client
client = Client()
# outputs: <Client: 'tcp://127.0.0.1:45451' processes=7 threads=28, memory=48.32 GB>
import dask.dataframe as dataframe
df = dataframe.read_parquet('data_simulated_partitioned.parquet')
df.npartitions
# 3941
df = df.repartition(partition_size='100MB')
# This is were hundreds of warnings arise
df.npartitions
# 137
df = df.persist()
# Again, hundreds of warnings

4reactions

jklencommented, Dec 28, 2020

Hi,

I am starting with dask on my laptop, going through the dask-tutorial which is on github, but I have similar issue with “distributed.utils_perf - WARNING - full garbage collections took xxx% CPU time recently (threshold: 10%)” always when I start local cluster with the distributed scheduler

It is occurring always when I start local cluster even when I submit simple calcualtions
different settings of parameters when starting the cluster, like n_workers, memory_limit, threads_per_worker, processes does not seem to have an effect
I use conda environment with python 3.7, on laptop with 4 cores, 8 logical processors and 16GB RAM
I tried using it on windows and an another machine with linux , no difference
the % cpu time in the warning is slowly climbing up and I got this warning almost every second
client.restart() does not have effect on this, these warnings show up again immediately
it is occurring only with the distributed scheduler
after these warnings start to show up and then after shutting down the cluster with client.shutdown() and then starting it again without restarting the ipython kernel, these warnings start to show up without submitting any calculation

Is there any progress on this pls, or any way to solve this?