question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Adding new characteristics to the HLG visualizations

See original GitHub issue
  • Task 1 - #7869 Status: Merged

  • Task 2A - #7886 Status: Review

- [ ] Task 2B - (SVG implementation in Graphviz) #XXXX Status: Graphviz acting weird and not allowing us to downscale the images

(I will keep updating this comment)


Hi 👋

I am posting this Feature Request issue here to start a discussion regarding enhancing the Graphviz output of the dask.visualize() method.

While I was adding color to show whether a HLG layeri s materialized or not using a light gray fill color, @mrocklin pointed out that it would be better to make this discussion public to gather opinions from the Dask community as a whole.

In the future I think that users would be very interested in attributes like layer size and type, as well as collection attributes like chunking structure in the case of dask arrays. Personally I would encourage you to focus your efforts there. You might also want to raise issues with proposed changes in the future. That will help you to get feedback on idea from a broad set of people in the community, rather than just the one or two that are engaging in the gsoc slack channel.

Source: PR #7843

@GenevieveBuckley worked on #7309 and added a new dictionary collection_annotations which has crucial information about the High Level Graphs which I believe can be shown in some way on the Graphviz output.

@martindurant mentioned this over at https://github.com/dask/dask/issues/7301#issuecomment-860686336

… based on the task naming conventions and e.g., for high-level graphs the number of sub-tasks and for arrays, the size of the operands. Such information might be added into the nodes as text, colour or edge-style (all probably optional).

If anyone else has any ideas, please leave them in the comments section. How should I proceed?

Let’s make the output of dask.visualize() more interesting and appealing to the eye! 🙌

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:24 (24 by maintainers)

github_iconTop GitHub Comments

2reactions
tomwhitecommented, Aug 10, 2021

Now, you will be able to visualize the intermediate state of the Dask arrays within the HTML Representation and it looks amazing 😄

That’s great - thanks for working on this @freyam!

2reactions
GenevieveBuckleycommented, Jul 6, 2021

I have a question: What is the general range of an HLG layer’s task count?

This could vary a lot. The dataframe shuffle example is probably the smallest size reasonable example we have (instead of the tiny toy examples we’ve also looked at). But depending on what users are doing, it could be very, very large indeed. I don’t think we can choose a fixed upper value for this.

I would like to know the following:

  • a reasonable minimum value for n_tasks

I’d say 1 task is the minimum value possible.

  • a reasonable maximum value for n_tasks

I don’t think there is a single maximum value we can pick. It will vary wildly depending on what kind of computation the user is doing. (Your suggestion to normalize to the biggest layer in each HLG structure might be a good way to handle this)

  • a reasonable math function to account for all kinds of values for n_tasks (I am inclining towards numpy.log and numpy.clip)

A log scale is a good option, yes.

Alternate Idea: I can calculate the minimum and the maximum n_tasks for each HLG and use the local minima and local maxima instead. Pro: This would normalize the overall graph structure. The graphs wouldn’t look very disproportionate. Con: We can no longer compare two different task graphs on the basis of the size as the minimums and the maximums can differ. (CounterPoint to the Con: We are actually also mentioning the actual number of n_tasks on every node itself. So, maybe this wouldn’t cause much trouble)

I think this is a good idea.

@GenevieveBuckley This seems like a great idea. It also seems fine to implement. All that’s left is how do we want to represent it on the screen? I believe the implementation of the “adding to the graphviz” is quite trivial. The really tricky part is how we want to show it. Which traits of the node/edge would you wanna tweak to show the new information. I do have some amazing sample diagrams ready which I can draft up by tomorrow. But, if anyone here has any plans or suggestions, I would love to hear that.

I had assumed Martin’s suggestion was to show a different color for each of these categories. Scrolling back up, it looks like he didn’t actually say that & I just imagined it. Nevertheless, perhaps color is good place to start. (Also, as I said early on in this project, you will probably have ideas about the best way to represent certain characteristics visually. Definitely add your own suggestions or ideas for discussion here too, if you have them)

I also agree with Martin’s comment “I think that getting the attributes into the plotting code is the primary thing, and deciding the how to represent them secondary”

Read more comments on GitHub >

github_iconTop Results From Across the Web

Charts and other visualizations in Power View
In Power View, you can quickly create a variety of visualizations, from tables and matrices to bubble charts and sets of small multiple...
Read more >
How to Choose the Right Data Visualization | Tutorial by Chartio
There are many ways that charts can be used to visualize data. Read this article to learn which charts can be used for...
Read more >
Data visualization with ggplot2 - Data Carpentry
ggplot graphics are built layer by layer by adding new elements. ... e.g., as x/y positions or characteristics such as size, shape, color,...
Read more >
New Features for Oracle BI EE 12c (12.2.1.4)
New Features for Oracle BI EE 12c (12.2.1.4). This topic describes new features in Oracle Data Visualization. The new features include: Connect to...
Read more >
Data Visualizations – SNAP Household Characteristics and ...
These interactive graphics describe the economic and demographic characteristics of households participating in SNAP by state and over time, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found