Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

dask.diagnostics profilers (Profiler, ResourceProfiler...) do not show profile information for distributed jobs

See original GitHub issue

I am trying to run the example linear algorithm dot provided here. I have a cluster of machines and hdfs. I used dask-ssh and hdfs3 to setup the machines and hdfs for python.

I want to use the profilers mentioned in the Diagnostics page of dask documentation (here). Unfortunately, when the task finishes and I inspect the results object of the respective profilers, they are empty. In contrast, when I do not use the distributed client, and try to do local computation, the profilers as well as the visualize function shows run information.

This has me quite baffled, as I thought the profiling would work for both distributed and local execution. I do not know of any other methodology using which I can get detailed profiling information of a distributed job on dask. My questions are -

Do the profilers not support distributed executions? If that is so, how can I get such profiling information when I try to run dask on clusters of machines? I need to benchmark dask distributed for an academic course, so it is quite important that I get informations such as - total run time, CPU usage and memory usage over entire cluster for a particular dask distributed job.
If the profilers do support distributed executions, is there an issue with the way I set up the code? Here is a snippet below which demonstrates the way I wrote code for this -

with Profiler() as prof, ResourceProfiler() as rprof, CacheProfiler() as cprof, ProgressBar() as progress, Client('MASTER:HOST') as client:
    out = client.compute(a2)
    # I've also tried out = a2.compute()
    print(prof.results,rprof.results, cprof.results)
    visualize([prof, rprof, cprof])

I’ll be grateful if someone can point out a solid way to get profiling information for dask distributed, I plan to use these information to write a performance analysis paper.

Issue Analytics

State:
Created 6 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

mrocklincommented, Apr 11, 2017

If you have node installed and add the following to your .dask/config.yaml file then you’ll get an extra tool that lets you download the task stream plot as a static html file. Currently you have to use Chrome.

bokeh-export-tool: True

Sorry for the secret nature of this feature. I use it a bunch but there are a lot of hidden corners to making it work (having node, using chrome rather than safari or firefox), so we haven’t publicized it.

If you feel adventurous then you might also look into how those plots are made. You could probably engineer your own diagnostic tool that wrote data directly to a file fairly easily.

https://github.com/dask/distributed/blob/master/distributed/bokeh/task_stream.py#L13-L30

0reactions

mrocklincommented, Apr 11, 2017

A minimal scheduler diagnostic is probably something like the following:

class MyPlugin(SchedulerPlugin):
    def __init__(self, scheduler):
        self.scheduler = scheduler
        scheduler.add_plugin(self)
    def transition(self, key, start, finish, *args, **kwargs):
        print(key, start, finish, kwargs)

client.run_on_scheduler(MyPlugin)

Top Results From Across the Web

Diagnostics (local) - Dask documentation

The ResourceProfiler class is used to profile Dask's execution at the resource level. During execution, it records the following information for each timestep:....

Diagnostics · Issue #21 · dask/distributed - GitHub

When optimizing complex problems on distributed I run into performance issues that seem unintuitive. In these cases it would be valuable to have ......

How to Check the Progress of Dask Computations - Coiled

So, it can be helpful to see their progress in real-time and visualize what's happening in your cluster. Dask provides some diagnostic and ......

How many threads does a dask worker use in ... - Stack Overflow

Related: What threads do Dask Workers have active? import dask import dask.array as da from dask.diagnostics import Profiler, ResourceProfiler, ...

Parallelisation — SWD6: High Performance Python

There are two main types of Dask scheduler which can deploy jobs: ... Many of the profiling tools we looked at earlier don't...