Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

On new or unified plot/trace types relating to `parcoords` and `sankey`

See original GitHub issue

Technical notes re https://github.com/plotly/plotly.js/issues/2221#issuecomment-354816207

On multidimensional explorers: made a separate issue as this topic is deep and the topic of “New charts” will likely be quite broad.

We’ve been planning multidimensional extensions eg. SPLOM. The above directions also make a lot of sense, and as usual there’s always the tradeoff among implementation time (ultimately $), payload size and functionality. A nice option would be a kind of unification of plots, eg. parcoords and sankey, both of which are relatively compact as they do one thing without much configuration. Another option is a new plot type, or a new trace type eg. on the substrate of parcoords.

Here’s a quick note on the expected challenges of integrating the new thing with either:

parcoords had been written for the express purpose of performance, so it uses WebGL for rendering. It uses GL.LINES as that’s the most compact geometry ie. fastest to do the interactions with (there were response time criteria). To make lines thicker, we’d need to convert to GL.TRIANGLES or similar, ie. 2-3x work in the vertex shader (which also does GPU crossfiltering). WebGL does have a lineWidth method but the standard permits that implementations cap line width to 1px, which browsers increasingly opted for over time… If Sankey-like splines are also needed, that’s another layer of performance hit and implementational complexity. Yet it’d be possible to add an SVG or polygonal WebGL trace type for those cases where data points are not in the multiple 10k range and more diverse geometry eg. thicker lines are needed. parcoords is already multilayer, the axes are in SVG and there could be more SVG layers.
sankey is an alternative target. After all, it internally works as a layered graph, ie. the internal representation is already close to the axis cadence of parallel coordinates and their ilk. But our implementation uses the heuristic as it is in d3-sankey (we only added support for multiedges via a PR), and it’s free to arrange the edges, resulting in the observed line discontinuity that’s definitely not in the style of parallel coordinates. This freedom yields a more optimal arrangement in the case of general Sankey work, ie. minimizing the line crossings, not a concern for parcoords-like work. We’d need to somehow add configuration for bypassing the heuristic or the entire d3-sankey in favor of a parcoords-like continuous line layout.

There’s the option that these two, and the new functionality, be unified, which is doable but sounds like even more work. Also, the chart design space is enormously large and the way plotly.js is set up, for better or worse, it’s geared more for specific, configurable chart types rather than a Grammar of Graphics like, fluid or low level way of building up toward a desired chart type. The reason it comes up is that there are a lot of possibly useful additions and improvements that can be made to the charts we speak of. The implementationally easiest thing on the other hand is a completely new chart type but that probably adds the largest JS payload (not sure if it’s a concern, @alexcjohnson or @etpinard could tell). Btw. there’s a parallel sets implementation that interacts not unlike our parcoords and sankey with drag&drop: https://www.jasondavies.com/parallel-sets/ which shows line continuity but not sure if it supports multiedges.

Issue Analytics

State:
Created 6 years ago
Comments:8 (7 by maintainers)

Top GitHub Comments

3reactions

alexcjohnsoncommented, Jan 2, 2018

Thanks for the detailed writeup @monfera

You’re right that there are a lot of possible extensions and convergence opportunities, but before we resort to those heavier (in terms of development) solutions, lets see what we can get out of extending the existing plot types.

To me, @jmmease’s problem is fairly simple, and extremely common: parcoords is basically the right data model (and sankey is not), but it doesn’t show the weight of entries passing through (and connecting) categorical variables.

What I had in mind for the parcoords extension is a decidedly non-random offset for category dimensions - so that each line through the same category gets offset by a small amount from the next. My gut reaction is that we should (at least by default) keep the line width at 1px but allow the per-point offsets to scale as needed (to larger than 1px so there are gaps between the lines, OR smaller than 1px so the lines merge into a band but its width is still proportional to its weight) to fill up a pleasant fraction of the available space. Then we also would want to do some sorting to try and minimize crossings in category-category connections. I’m thinking this could be as simple as taking the middle category, sort first by that, next by its neighbors, and so on until we get to the end of category dimensions. The result would be something like this (in the >1px offset case - pardon the sketchiness, it could certainly get niceties like horizontal segments within the category bars but hopefully this gets the idea across):

Then as a second extension, we could implement fat-line rendering, possibly also with per-line thicknesses, which would end up looking exactly like parallel categories/sets. @monfera this would only be a relevant option when the total row count is low, so performance should not be an issue. It would also reduce precision when you’re trying to interpret a continuous dimension, so I’d keep it optional anyway. Or perhaps fat lines for categorical dimensions, dropping to 1px for continuous? That might actually be a pleasing effect…

@jmmease perhaps what I’ve drawn is similar to what you had in mind with your two sticking points? (1) I think is a matter of taste, that we can deal with in various ways as I already mentioned. (2) is a good point, though with sufficient visual cues (ie boxes instead of an axis line) it feels to me like it’s easy to intuitively distinguish, and the flip side is that when you’re exploring via selection the category dimensions can help bring out density information pertaining to the continuous variables. My worry with using random jitter is that it still wouldn’t necessarily indicate density, particularly as the sample size gets larger.

1reaction

jonmmeasecommented, Jan 3, 2018

Thanks for posting the mockup @alexcjohnson , that’s really helpful and I think it alleviates my reservations.

If the category-to-category lines can optionally be made thicker or covered by a patch in the sparse case then, as you said, this could reproduce the appearance of Parallel Categories. In addition to sorting the lines by category, I think they should also be sorted by color so that lines of the same color that have the same values of the categorical dimensions will cluster together.
Seeing the categorical dimensions with the boxes and without the axis lines does help make them look fundamentally different than the continuous dimensions.

Here’s a list of some of the other features that an enhanced parcoords would need in order to replace the use-cases I’m building parcats for.

Drag dimension title horizontally to reorder categorical dimensions just like continuous dimensions
Drag category boxes vertically to reorder categories within a dimension
Hover over category boxes to display a tooltip that may contain the count and relative frequency of the samples in that category (similar to Sankey node tooltip)
Hover over bundles of lines to display a tooltip that may contain the count and relative frequency of the samples in the bundle (similar to Sankey link tooltip). By a ‘bundle’ I mean a collection of lines that have identical values in all dimensions.
Constraint support that works similarly to (and alongside) the constraintrange property for continuous dimensions. Maybe a constraintcats property on categorical dimensions that accepts a list of categories to constrain on. When set, lines through all other categories are grayed out. When constraints are active, the tooltips also provide the constrained count and constrained relative frequency.
plotly_resytle events would need to be emitted for interactive dimension reorder, category reorder, and constraint changes. And the events would need to contain the property information necessary to synchronize to other figures.

And a few stylistic nice-to-haves that might not be feasible with WebGL in the mix.

Curved Sankey-style lines/paths.
Animated transitions from final drag position (of dimensions or categories) back to the neutral position.

I do think this combination would make for a really powerful and flexible multi-dimensional visualization/analysis tool. Is this something the core team would be interested in working on soon? @alexcjohnson @monfera @etpinard

From my side, I plan to finish up my parcats trace for our current internal needs and then I’ll work on getting approval to open source it (this is probably a month or two out). I think it’s pretty useful given the current set of Plotly.js traces, but its future utility will depend on how far you all are interested/able to push parcoords in the categorical direction. Thanks for taking the time to discuss this!