Adding new characteristics to the HLG visualizations
See original GitHub issue- [ ] Task 2B - (SVG implementation in Graphviz) #XXXX
Status: Graphviz acting weird and not allowing us to downscale the images
(I will keep updating this comment)
Hi 👋
I am posting this Feature Request issue here to start a discussion regarding enhancing the Graphviz output of the dask.visualize() method.
While I was adding color to show whether a HLG layeri s materialized or not using a light gray fill color, @mrocklin pointed out that it would be better to make this discussion public to gather opinions from the Dask community as a whole.
In the future I think that users would be very interested in attributes like layer size and type, as well as collection attributes like chunking structure in the case of dask arrays. Personally I would encourage you to focus your efforts there. You might also want to raise issues with proposed changes in the future. That will help you to get feedback on idea from a broad set of people in the community, rather than just the one or two that are engaging in the gsoc slack channel.
Source: PR #7843
@GenevieveBuckley worked on #7309 and added a new dictionary collection_annotations which has crucial information about the High Level Graphs which I believe can be shown in some way on the Graphviz output.
@martindurant mentioned this over at https://github.com/dask/dask/issues/7301#issuecomment-860686336
… based on the task naming conventions and e.g., for high-level graphs the number of sub-tasks and for arrays, the size of the operands. Such information might be added into the nodes as text, colour or edge-style (all probably optional).
If anyone else has any ideas, please leave them in the comments section. How should I proceed?
Let’s make the output of dask.visualize() more interesting and appealing to the eye! 🙌
Issue Analytics
- State:
- Created 2 years ago
- Comments:24 (24 by maintainers)

Top Related StackOverflow Question
That’s great - thanks for working on this @freyam!
This could vary a lot. The dataframe shuffle example is probably the smallest size reasonable example we have (instead of the tiny toy examples we’ve also looked at). But depending on what users are doing, it could be very, very large indeed. I don’t think we can choose a fixed upper value for this.
I’d say 1 task is the minimum value possible.
I don’t think there is a single maximum value we can pick. It will vary wildly depending on what kind of computation the user is doing. (Your suggestion to normalize to the biggest layer in each HLG structure might be a good way to handle this)
A log scale is a good option, yes.
I think this is a good idea.
I had assumed Martin’s suggestion was to show a different color for each of these categories. Scrolling back up, it looks like he didn’t actually say that & I just imagined it. Nevertheless, perhaps color is good place to start. (Also, as I said early on in this project, you will probably have ideas about the best way to represent certain characteristics visually. Definitely add your own suggestions or ideas for discussion here too, if you have them)
I also agree with Martin’s comment “I think that getting the attributes into the plotting code is the primary thing, and deciding the how to represent them secondary”