Supporting rasterize options via a compositor
See original GitHub issueA long-standing issue has been how options used with regular plotting don’t transfer to the output from the datashader operations. It has been an important design principle to keep operations independent of the option machinery, especially since that style options are backend dependent.
However, with improvements in Bokeh (and to some extent matplotlib), we hope to be able to avoid the use of datashade
and use rasterize
instead. In particular, we now have eq_hist
support on the plotting/client side and soon Bokeh will support categorical color mapping as well. Datashader now supports line thicknesses for Curve
/Path
elements, helping close the gap further. This improved rasterize
approach works better with HoloViews than datashade
so this means this is a good time to revisit this problem.
We can keep the basic datashader operations independent of the options system (leaving them as they are now) by making use of holoviews compositors (a powerful system that we use for statistical elements but haven’t really discussed or documented much). Compositors allow us to map option systems settings to operation parameters during the plotting process, allowing us to substitute the element the user passes to the display machinery with the output of the compositor operation instead. To make this work, we would add new datashader operations that should not be used directly by users but are purely about the display process.
This means we could support rasterize
as follows:
hv.Curve([1,2,3]).opts(rasterize=True)
The presence of rasterize=True
would enable the compositor operation with would grab all the necessary style/plot options, run the regular rasterize
operation with those options and display the result.
We could map certain options directly. Off the top of my head, common ones are color
, line_width
and cmap
. So you could use:
hv.Curve([1,2,3]).opts(rasterize=True, color='red', line_width=4)
So far, the only new options is rasterize
and while I suspect rasterize
might be a matplotlib options somewhere, I don’t think it is one that is important to expose anywhere (something to do with SVG rendering?).
There are always going to be options that only make sense at the datashader level (e.g. aggregator
, x_sampling
/y_sampling
) and for this, I propose that rasterize
can also be a dictionary with options aimed at datashader:
hv.Curve([1,2,3]).opts(rasterize=dict(aggregator=ds.mean('dim')), color='red', line_width=4)
This dictionary would be an override so it would take precedence for any options that would normally be automatically-mapped to datashader. The hope would be that this dictionary is only needed when more control is needed and that the usual style options are sufficient for most common cases.
We can consider adding another option to this dictionary, sample_limit
which would state the sample limit (e.g rows if tabular) at which point datashader rasterization is enabled. I’m not convinced we should implement more than this at the HoloViews level (and even if we do expose this more, the default should always be to have datashading off!) but hvplot
could expose an easy API to specify the level/heuristic used to automatically enable/disable datashader output.
Open questions
- The appropriate default aggregator probably needs to switch based on element type (e.g.
NdOverlays
to categorical?), among other things. - What is the full set of options that can be mapped? Are they semantically compatible with the normal options?
- How would hover work? Typically, you would get counts and you wouldn’t get access to all the available dimensions (but recent work means you might get some dimension values out)
Issue Analytics
- State:
- Created 10 months ago
- Comments:6 (6 by maintainers)
Top GitHub Comments
Talking with @maximlt I remembered that we would need to use
spread/dynspread
for point sizes, which I don’t particularly like. It is fine if it works well enough though!I think a dynamic limit is what people would normally want, if they are coming from a perspective of really wanting a normal Bokeh plot but falling back to Datashader when things get too big, which I think is the usual motivation for
rasterize=50000
. There may also be people who want to select datashader exclusively for a dataset if it’s over a certain size, but I’d rather not even have to document that option because it’s confusing and hard for people to grasp the difference, so I wouldn’t even implement it unless we had a really clear use case and motivation. So my vote is for dynamic only.