Visualize TaskGroups
See original GitHub issueWe should find other representations to replace the Task stream. See https://github.com/dask/distributed/issues/4260
One view of that data is aggregated within the TaskGroups. A TaskGroup collects many related tasks together. For example one dd.read_csv
call may generate 10,000 tasks, but will generate only one task group. These correpond to high level layers on the client side, or Spark layers.
Task Groups contain information that is potentially useful to convey. Here is a subset
- start and stop time of every group
- how long we’ve spent on the group, both in comptuation, but also data transfer and other activities
- amount of data processed / currently in storage
- dependency relationships to other taskgroups
- how far along we are in computing them, as well as if we’ve had any errors (this is the same information we have in the progress bars in the status page of the dashboard today)
How should we convey this information visually to the user? As mentioned above, we convey the progress of tasks within a taskgroup today in the progress chart. Great, what else? We could consider doing something like these graphs from spark
But perhaps augmented real-time and with color/size/shading differences with the updated information that we have.
I walked down this path briefly in the attached notebook, using start and stop times to inform layout. I found that, due to overlap, this was hard/impossible. I’m now of the opinion that layout should be purely informed by dependency graph structure (similar to the Spark image above). However, I think that once we have that rough layout there is a lot that we can do with regards to color/size/shading that will be fun. Layout is still an interesting problem though, especially when trying to make the general case robustly laid out.
I think that we have all the information that we need in TaskGroups. Right now the next thing to do is to think about visualization, which should be fun
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (5 by maintainers)
Top GitHub Comments
@ian-r-rose you may also find this interesting. I think that you and @ncclementi might be a good pairing here. (James’ idea, I just wanted to make sure that this got out there)
On Tue, May 4, 2021 at 2:28 PM Benjamin Zaitlen @.***> wrote:
Thanks James, I’ll start looking at this.