Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Parcoords multiselection

See original GitHub issue

Currently, parcoords permits the selection of one contiguous range within each dimension (via brushing, or via an API call). We’re working on support for multiselections, ie. an OR relation among multiple ranges within a specific dimension. Also, any of the 60+ dimensions could have multiple selections. The goal is to retain as much speed as feasible.

Currently, a promising route for the multi-selection is the use of a pixel-based bitmap (likely via textures) for each dimension. Basically, instead of arbitrary floating-point(*) ranges, filtering would have no higher resolution than what can be selected on the screen, or discerned as lines, ie. the pixel raster. A mechanical interpretation is that brushing would turn pixels on/off, enabling or disabling lines that go through that pixel. This is sufficient for mouse based interactions, but in theory(**), a floating-point range can be more accurate, allowing subpixel selection via the API.

(*) (**) the 32 bit floating points are effectively 23 bit integers in this regard, which is quite a bit more than the (for example) 12 bit precision (=4096) that characterizes a 4k tall parcoords, or the 8-10 bit precision on views of regular height eg. on a dashboard. On the other hand, parcoords is an aggregate view where going more accurate than a pixel limit may not be useful even through applying the constraints via the API. So this approach is workable iff the visible raster is deemed sufficient as constraint resolution via the API. (It’s not possible to go subpixel level with the mouse, unless with browser zooming or similar.)

The motive is that the vertical height of the parcoords is bounded by reasonable monitor resolutions, and WebGL gl.MAX_TEXTURE_SIZE is at least 4096, and gl.MAX_TEXTURE_IMAGE_UNITS is at least 8, and each texture pixel can be up to 4 bytes (64 bits) so even in the worst case, there’s plenty of room for a full-height 4k monitor display and our almost 64 dimensions. The resulting vertex shader algo is a simple texture lookup, and given that 64 bits make up the texture pixels and we have almost 64 dimensions, it could reduce to simple bit fiddling operations.

There are alternatives, but as this approach is promising and would have fairly bounded runtime (to be tested), and represents no limit on how many ranges can be selected, we should discuss if this is an acceptable solution, before enlisting more involved alternatives.

Issue Analytics

State:
Created 6 years ago
Comments:9 (6 by maintainers)

Top GitHub Comments

1reaction

monferacommented, Feb 15, 2018

@phxnsharp thanks for the added info. We’ve been working on the pixel mask on the GPU on which things like lookup masks (conveniently, at pixel pitch, but can be sparser or denser up to 4k evenly spaced bins) are fairly straightforward and are of O(1) complexity, but relation joins aren’t.

What I mean by joins is that the requirement is analogous to joining a data relation (table with possibly 10k or more rows per dimension, up to ~60 dimensions) with a filter relation (up to 1000 values ie. records per dimension), and performing such joins requires operations (merge join, hash join B-tree index lookup) that aren’t a good fit for WebGL - such operations are sometimes done in a GPGPU setting with CUDA or OpenCL, but WebGL has no compute shaders so we’re resorted to using vertex and pixel shaders. For example, in WebGL even a for loop where the iteration count is not known in advance is tricky, because the iteration count needs to be known at compile time, and setting a too high value for the worst case means that its performance will always reflect that.

We’ll do some brainstorming though to see how it could be solved, given the specifics that only ordinal variables need to be filtered, and with up to 1000 values per dimension. A promising direction is to bijectively map the categorical values of possibly uneven cadence [x0, …, xN] to integers [0, …, N], and bring back the “distortion” via the inverse mapping (via a new lookup table) on the GPU for rendering. The benefit is that the filter could use the integer grid, so another lookup table (bitmap) could be indexed into to check if the value is selected. While it looks feasible, it’s computationally more involved than the pixel based approach because it’s two more lookups than what we have now and we may need to do it for all dimensions, categorical or not. We may have to bake in the ~1000 (eg. 1024) limit for categorical values which doesn’t sound like a big loss. We’ll check if there are enough GPU resources for this approach to run on most or all computers to avoid relying on resource limits such as allowed texture count which aren’t well supported by either the standard or the overwhelming majority of hardware makers.

1reaction

monferacommented, Feb 5, 2018

Keeping both modes is possible: w’d preserve what we have now and add the raster based filtering in an AND relationship. We might be able to even avoid a texture lookup for those dimensions that don’t need it, although the GPU is often faster if we just let it do superfluous parallel work if it helps avoid conditional branching. If the bitmap raster is fast enough on its own, then it’ll be fast enough when intersecting it with what we have now. All this (multidimensional) point selection logic is happening in the vertex shader, and our bottleneck is the fragment shader anyway.