Visualize synopsis of large graphs
See original GitHub issuePeople appreciate our dot
images, especially when coupled with visualizations like the following (courtesy of Jim)
Unfortunately these don’t scale beyond a few hundred nodes. Meaningfully visualizing very large graphs is a hard problem.
However, while we can’t visualize full dask graphs, we might be able to build synopses graphs and visualize those. In particular, we may be able to take advantage of Dask’s naming scheme to show connections between groups of tasks. Presumably line widths and node sizes/shapes could be used to fill in some of the dropped information.
Additionally, if we can flesh out a good way to construct and visualize these on the fly then we may be able to get something live running on Bokeh. This is probably secondary to just visualizing things statically though.
cc @eriknw , who just gave a cool talk/demo about visualizing graphs
Issue Analytics
- State:
- Created 7 years ago
- Comments:18 (17 by maintainers)
Top GitHub Comments
I believe I used https://gist.github.com/jcrist/0c28f632513aa13d4edea3d482bf47d1 (*edit: previously posted wrong link) for this, but that gist is several years old and may no longer work.
This is something I wrote up and showed @eriknw last week. It’s very cheap to compute, but has fail cases. The idea here is that we use the convention of having each key in the graph be a tuple where the first element is a prefix followed by a hash. For dask collections this allows us to group keys from the same operation into a single box, unioning all inputs and outputs. However, this will fail for graphs that don’t follow this convention, or graphs that were created with
dask.delayed
, as the tokens fordask.delayed
graphs are all different.The code can be found here.
Example:
The full graph looks like:
While the simplified looks like:
One idea for how to use these would be to reuse the graphviz-to-bokeh code I wrote up 2 years ago to convert these graphs into bokeh plots. We could color the nodes the same as their progressbars, which would allow users to see the progress for each task, as well as where in the general pipeline each task is.