dask.diagnostics profilers (Profiler, ResourceProfiler...) do not show profile information for distributed jobs
See original GitHub issueI am trying to run the example linear algorithm dot provided here. I have a cluster of machines and hdfs. I used dask-ssh
and hdfs3
to setup the machines and hdfs for python.
I want to use the profilers mentioned in the Diagnostics page of dask documentation (here). Unfortunately, when the task finishes and I inspect the results
object of the respective profilers, they are empty. In contrast, when I do not use the distributed client, and try to do local computation, the profilers as well as the visualize function shows run information.
This has me quite baffled, as I thought the profiling would work for both distributed and local execution. I do not know of any other methodology using which I can get detailed profiling information of a distributed job on dask. My questions are -
-
Do the profilers not support distributed executions? If that is so, how can I get such profiling information when I try to run dask on clusters of machines? I need to benchmark dask distributed for an academic course, so it is quite important that I get informations such as - total run time, CPU usage and memory usage over entire cluster for a particular dask distributed job.
-
If the profilers do support distributed executions, is there an issue with the way I set up the code? Here is a snippet below which demonstrates the way I wrote code for this -
with Profiler() as prof, ResourceProfiler() as rprof, CacheProfiler() as cprof, ProgressBar() as progress, Client('MASTER:HOST') as client:
out = client.compute(a2)
# I've also tried out = a2.compute()
print(prof.results,rprof.results, cprof.results)
visualize([prof, rprof, cprof])
I’ll be grateful if someone can point out a solid way to get profiling information for dask distributed, I plan to use these information to write a performance analysis paper.
Issue Analytics
- State:
- Created 6 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
If you have node installed and add the following to your
.dask/config.yaml
file then you’ll get an extra tool that lets you download the task stream plot as a static html file. Currently you have to use Chrome.Sorry for the secret nature of this feature. I use it a bunch but there are a lot of hidden corners to making it work (having node, using chrome rather than safari or firefox), so we haven’t publicized it.
If you feel adventurous then you might also look into how those plots are made. You could probably engineer your own diagnostic tool that wrote data directly to a file fairly easily.
https://github.com/dask/distributed/blob/master/distributed/bokeh/task_stream.py#L13-L30
A minimal scheduler diagnostic is probably something like the following: