question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Visualize synopsis of large graphs

See original GitHub issue

People appreciate our dot images, especially when coupled with visualizations like the following (courtesy of Jim)

Unfortunately these don’t scale beyond a few hundred nodes. Meaningfully visualizing very large graphs is a hard problem.

However, while we can’t visualize full dask graphs, we might be able to build synopses graphs and visualize those. In particular, we may be able to take advantage of Dask’s naming scheme to show connections between groups of tasks. Presumably line widths and node sizes/shapes could be used to fill in some of the dropped information.

Additionally, if we can flesh out a good way to construct and visualize these on the fly then we may be able to get something live running on Bokeh. This is probably secondary to just visualizing things statically though.

cc @eriknw , who just gave a cool talk/demo about visualizing graphs

Issue Analytics

  • State:open
  • Created 7 years ago
  • Comments:18 (17 by maintainers)

github_iconTop GitHub Comments

1reaction
jcristcommented, Sep 18, 2020

I believe I used https://gist.github.com/jcrist/0c28f632513aa13d4edea3d482bf47d1 (*edit: previously posted wrong link) for this, but that gist is several years old and may no longer work.

1reaction
jcristcommented, Mar 15, 2017

This is something I wrote up and showed @eriknw last week. It’s very cheap to compute, but has fail cases. The idea here is that we use the convention of having each key in the graph be a tuple where the first element is a prefix followed by a hash. For dask collections this allows us to group keys from the same operation into a single box, unioning all inputs and outputs. However, this will fail for graphs that don’t follow this convention, or graphs that were created with dask.delayed, as the tokens for dask.delayed graphs are all different.

The code can be found here.

Example:

In [1]: from vis import simple_vis

In [2]: import numpy as np

In [3]: import dask.array as da

In [4]: x = np.arange(100).reshape((10, 10))

In [5]: dx = da.from_array(x, chunks=(5, 5))

In [6]: res = dx.dot(dx.T).sum(axis=1).mean() + dx.mean(axis=0) * 4

In [7]: simple_vis(res)
Out[7]: <IPython.core.display.Image object>

In [8]: res.visualize()
Out[8]: <IPython.core.display.Image object>

The full graph looks like:

mydask

While the simplified looks like:

simple

One idea for how to use these would be to reuse the graphviz-to-bokeh code I wrote up 2 years ago to convert these graphs into bokeh plots. We could color the nodes the same as their progressbars, which would allow users to see the progress for each task, as well as where in the general pipeline each task is.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Visualizing large graphs - Yifan Hu
In this article we look at algorithms and techniques for visualizing large graphs. For laying out large graphs, the scalability of the algorithm...
Read more >
Large Graph Visualization Tools and Approaches
Cosmograph: Visual analytics for big graphs​​ Can handle hundreds of thousand vertices and edges. Even faster then desktop tools.
Read more >
How to Visualize a Graph with a Million Nodes | Nightingale
Large -scale graph visualizations are tricky. The more nodes and edges you have in your network, the more difficult it is to compute...
Read more >
What are the best tools for visualizing large graphs? - Quora
If you believe you need hardcore graphics capability in order to do your graph visualization, you're usually asking the wrong question. Large graphs...
Read more >
Summarizing and Understanding Large Graphs
1 and 9(a-b) illustrate the original and VOG-based visualization of the Controversy graph. The VOG-TOP10 summary consists of 8 stars and 2 near-bipartite...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found