question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

dask.diagnostics profilers (Profiler, ResourceProfiler...) do not show profile information for distributed jobs

See original GitHub issue

I am trying to run the example linear algorithm dot provided here. I have a cluster of machines and hdfs. I used dask-ssh and hdfs3 to setup the machines and hdfs for python.

I want to use the profilers mentioned in the Diagnostics page of dask documentation (here). Unfortunately, when the task finishes and I inspect the results object of the respective profilers, they are empty. In contrast, when I do not use the distributed client, and try to do local computation, the profilers as well as the visualize function shows run information.

This has me quite baffled, as I thought the profiling would work for both distributed and local execution. I do not know of any other methodology using which I can get detailed profiling information of a distributed job on dask. My questions are -

  • Do the profilers not support distributed executions? If that is so, how can I get such profiling information when I try to run dask on clusters of machines? I need to benchmark dask distributed for an academic course, so it is quite important that I get informations such as - total run time, CPU usage and memory usage over entire cluster for a particular dask distributed job.

  • If the profilers do support distributed executions, is there an issue with the way I set up the code? Here is a snippet below which demonstrates the way I wrote code for this -

with Profiler() as prof, ResourceProfiler() as rprof, CacheProfiler() as cprof, ProgressBar() as progress, Client('MASTER:HOST') as client:
    out = client.compute(a2)
    # I've also tried out = a2.compute()
    print(prof.results,rprof.results, cprof.results)
    visualize([prof, rprof, cprof])

I’ll be grateful if someone can point out a solid way to get profiling information for dask distributed, I plan to use these information to write a performance analysis paper.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
mrocklincommented, Apr 11, 2017

If you have node installed and add the following to your .dask/config.yaml file then you’ll get an extra tool that lets you download the task stream plot as a static html file. Currently you have to use Chrome.

bokeh-export-tool: True

Sorry for the secret nature of this feature. I use it a bunch but there are a lot of hidden corners to making it work (having node, using chrome rather than safari or firefox), so we haven’t publicized it.

If you feel adventurous then you might also look into how those plots are made. You could probably engineer your own diagnostic tool that wrote data directly to a file fairly easily.

https://github.com/dask/distributed/blob/master/distributed/bokeh/task_stream.py#L13-L30

0reactions
mrocklincommented, Apr 11, 2017

A minimal scheduler diagnostic is probably something like the following:

class MyPlugin(SchedulerPlugin):
    def __init__(self, scheduler):
        self.scheduler = scheduler
        scheduler.add_plugin(self)
    def transition(self, key, start, finish, *args, **kwargs):
        print(key, start, finish, kwargs)

client.run_on_scheduler(MyPlugin)
Read more comments on GitHub >

github_iconTop Results From Across the Web

Diagnostics (local) - Dask documentation
The ResourceProfiler class is used to profile Dask's execution at the resource level. During execution, it records the following information for each timestep:....
Read more >
Diagnostics · Issue #21 · dask/distributed - GitHub
When optimizing complex problems on distributed I run into performance issues that seem unintuitive. In these cases it would be valuable to have ......
Read more >
How to Check the Progress of Dask Computations - Coiled
So, it can be helpful to see their progress in real-time and visualize what's happening in your cluster. Dask provides some diagnostic and ......
Read more >
How many threads does a dask worker use in ... - Stack Overflow
Related: What threads do Dask Workers have active? import dask import dask.array as da from dask.diagnostics import Profiler, ResourceProfiler, ...
Read more >
Parallelisation — SWD6: High Performance Python
There are two main types of Dask scheduler which can deploy jobs: ... Many of the profiling tools we looked at earlier don't...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found