question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

idle dask-worker & dask-scheduler have elevated CPU utilization

See original GitHub issue

I am running dask-worker & dask-scheduler colocated. Idle dask-worker & dask-scheduler show ~3% for each worker/task, without any dask client connected.
log is clean.

10528 owner      20   0 13016  3168  2836 S  0.0  0.0  0:00.00 ├─ /bin/bash -c source /opt/miniconda3/etc/profile.d/conda.sh;conda activate owner;dask-worker --nprocs 10 localhos
10541 owner      20   0 1840M  113M 28772 S  6.0  0.1 15h26:52 │  └─ /home/owner/.conda/envs/owner/bin/python /home/owner/.conda/envs/owner/bin/dask-worker --nprocs 10 localhost:
10577 owner      20   0 1091M  323M 88340 S  2.6  0.3  8h25:52 │     ├─ /home/owner/.conda/envs/owner/bin/python -c from multiprocessing.spawn import spawn_main; spawn_main(track
10573 owner      20   0 1088M  320M 88348 S  2.6  0.2  8h29:13 │     ├─ /home/owner/.conda/envs/owner/bin/python -c from multiprocessing.spawn import spawn_main; spawn_main(track
10569 owner      20   0 1085M  317M 88568 S  2.6  0.2  8h21:54 │     ├─ /home/owner/.conda/envs/owner/bin/python -c from multiprocessing.spawn import spawn_main; spawn_main(track
10566 owner      20   0 1082M  313M 88424 S  2.0  0.2  8h15:14 │     ├─ /home/owner/.conda/envs/owner/bin/python -c from multiprocessing.spawn import spawn_main; spawn_main(track
10562 owner      20   0 1093M  325M 88476 S  2.0  0.3  8h33:52 │     ├─ /home/owner/.conda/envs/owner/bin/python -c from multiprocessing.spawn import spawn_main; spawn_main(track
10559 owner      20   0 1089M  321M 88816 S  2.6  0.3  8h33:56 │     ├─ /home/owner/.conda/envs/owner/bin/python -c from multiprocessing.spawn import spawn_main; spawn_main(track
10556 owner      20   0 1082M  313M 88600 S  2.6  0.2  8h25:37 │     ├─ /home/owner/.conda/envs/owner/bin/python -c from multiprocessing.spawn import spawn_main; spawn_main(track
10554 owner      20   0 1085M  315M 88176 S  2.6  0.2  8h17:27 │     ├─ /home/owner/.conda/envs/owner/bin/python -c from multiprocessing.spawn import spawn_main; spawn_main(track
10551 owner      20   0 1106M  339M 88212 S  2.6  0.3  8h32:54 │     ├─ /home/owner/.conda/envs/owner/bin/python -c from multiprocessing.spawn import spawn_main; spawn_main(track
10547 owner      20   0 1091M  322M 87940 S  2.6  0.3  8h28:29 │     ├─ /home/owner/.conda/envs/owner/bin/python -c from multiprocessing.spawn import spawn_main; spawn_main(track
10545 owner      20   0 28264 11332  6284 S  0.0  0.0  0:00.03 │     └─ /home/owner/.conda/envs/owner/bin/python -c from multiprocessing.resource_tracker import main;main(10)
10517 owner      20   0 13020  3268  2940 S  0.0  0.0  0:00.00 ├─ /bin/bash -c source /opt/miniconda3/etc/profile.d/conda.sh;conda activate owner;dask-scheduler
10540 owner      20   0 2282M 1839M 38752 S  3.3  1.4  9h17:18 │  └─ /home/owner/.conda/envs/owner/bin/python /home/owner/.conda/envs/owner/bin/dask-scheduler

Environment:

conda info

     active environment : owner
    active env location : /home/owner/.conda/envs/owner
            shell level : 2
       user config file : /home/owner/.condarc
 populated config files :
          conda version : 4.9.2
    conda-build version : not installed
         python version : 3.8.5.final.0
       virtual packages : __cuda=11.2=0
                          __glibc=2.27=0
                          __unix=0=0
                          __archspec=1=x86_64
       base environment : /opt/miniconda3  (writable)
           channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /opt/miniconda3/pkgs
                          /home/owner/.conda/pkgs
       envs directories : /opt/miniconda3/envs
                          /home/owner/.conda/envs
               platform : linux-64
             user-agent : conda/4.9.2 requests/2.24.0 CPython/3.8.5 Linux/5.4.0-62-generic ubuntu/18.04.5 glibc/2.27
                UID:GID : 0:0
             netrc file : None
           offline mode : False

dask                      2021.4.0           pyhd8ed1ab_0    conda-forge
dask-core                 2021.4.0           pyhd8ed1ab_0    conda-forge
distributed               2021.4.0         py38h578d9bd_0    conda-forge

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:9 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
dkhokhlovcommented, Jul 6, 2021

Managed to bring it down to 0.7% per worker. Profile interval made up most of the difference.

/etc/dask# cat dask.yaml
distributed:
   admin:
     tick:
       interval: 1s
     system-monitor:
       interval: 5s
   worker:
     profile:
       interval: 5s

0reactions
mrocklincommented, Jul 16, 2021

It might make sense for folks to take a look at how we use psutil. Maybe there are nicer/cheaper ways to get system usage information.

On Fri, Jul 16, 2021 at 5:38 AM Jacob Tomlinson @.***> wrote:

This makes it impossible to use GCP’s autoscaler to scale a Dask cluster based on the workers’ CPU usage.

Just a warning when using autoscaling tools outside of Dask Adaptive. When scaling down Dask chooses which workers to remove and actively consolidates memory onto other workers before removing them.

If you let something like the GCP autoscaler do this based on a metric like CPU or memory then workers which are holding futures can be removed during scale down. This triggers Dask to resubmit tasks to recalculate the lost memory, which could then cause a scale up again. Clusters can bounce around like this randomly without ever completing the workload.

Maybe I’ll only have to restart the dask-worker daemon every week.

I am interested in learning more about your workload. Most folks I interact with that run Dask on GCP do so ephemerally, creating and destorying clusters as they are needed. It sounds like you have a long running Dask cluster which you reuse. Could you share more about what you are doing with it?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/distributed/issues/5000#issuecomment-881351396, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTGROI2D4HUB6TKZIFLTYADYTANCNFSM47ROKUDQ .

Read more comments on GitHub >

github_iconTop Results From Across the Web

Dask distributed: How to computationally recognize if worker ...
Is there a way to detect idling workers, e.g. by computing the average CPU utilization of the worker in the last minute?
Read more >
Scheduler State Machine - Dask.distributed
Pick a worker for a runnable root-ish task, if not all are busy. Picks the least-busy worker out of the idle workers (idle...
Read more >
Dashboard Diagnostics - Dask documentation
Task Processing/CPU Utilization/Occupancy: Tasks being processed by each ... The scheduler will try to ensure that the workers are processing about the same ......
Read more >
Configuration - Dask documentation
Environment variables like DASK_DISTRIBUTED__SCHEDULER__WORK_STEALING=True ... Within dask_foo code, use the dask.config.get function to access ...
Read more >
Worker Memory Management - Dask.distributed
If the system reported memory use is above 70% of the target memory usage (spill threshold), then the worker will start dumping unused...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found