Transform input data: groupby, filter
See original GitHub issuePreviously discussed (some lists are from @chriddyp) :
A groupby
transform should split apart traces as per unique values or bins of the groupby
dimension. Example:
groupby: ['a', 'b', 'a', 'b']
x: [1, 2, 1, 2]
y: [10, 20, 30, 40]
should generate two traces:
trace 1:
x: [1, 2]
y: [10, 20]
trace 2:
x: [1, 2]
y: [30, 40]
Static groupby
as a means of splitting spatially and/or aesthetically
- distinct categorical values: numbers, strings or
datetime
strings - evenly spaced bins based on numerical data or time (
datetime
strings) in thegroupby
attribute, reusing logic of the preexistingplotly
algorithm for histograms
Functional aspects:
-
groupby
needs to work across numbers, dates, and categories (@chriddyp in the JS context, meaning strings, correct?) -
groupby
needs to split across all of the arrays or array-like specifications in a trace, not justx
andy
. For example,marker.color
ormarker.line.color
. Not all array-like specifications in a trace are actual arrays (considercolorscale
) -
There must be a way of specifying distinct styles for the split apart traces so that they’re discernible - example:
transform: groupby: ['a', 'b', 'a', 'b'] marker: color: a: 'blue' b: 'red'
-
@etpinard found some issues with legend items as he wrote an initial version of transforms: https://github.com/plotly/plotly.js/pull/499#issuecomment-216597436. We’ll probably need to modify some of the
transforms
and API. That’s OK -transforms
was made forgroupby
-
All relevant denotations for
groupby
, and the related animation split use (see below) need to be in the JSON format for serializability, fitting in the current declarative structure -
The transforms such as
groupby
must work in therestyle
andrelayout
steps, not just the initialplot
step -
gd.data
is expected to preserve the single trace and thegroupby
spec as the user supplied, and_fullData
on the other hand has the individual (spllt) traces and no longer has thegroupby
attribute -
We must ID traces in
_fullData
back to groups or styles indata
. Styling controls will be populated with the defaults from_fullData
(e.g._fullData[4].marker.color
) but they’ll need to update the attributes in thedata
object (e.g.data[0].transform.marker.color.d
). That’s because we serialize and savedata
, not_fullData
.
Preliminary work
Related PR, containing the initial, analogous filter
work by @timelyportfolio : https://github.com/plotly/plotly.js/pull/859
groupby
: https://github.com/plotly/plotly.js/blob/master/test/jasmine/assets/transforms/groupby.js
Planned groupby
coverage of the initial sprint
- It would cover a positive list of attributes for
groupby
such asx
andy
but not all at once - HOWEVER the preferred solution aims for generality because other transforms will need to use a similar approach e.g.filter
, and future arraylike attributes should be covered without code coupling to transformations (consequence: we’ll have to check if there’s enoughattribute
metadata that allows us to tell if it’s arraylike, or we need further metadata; also, whether there’s a programmatic way of separating arraylike data e.g.colorscale
that’s not represented as an array at input, otherwise we need to handle them attribute by attribute (we’ll have to come back to this topic after a first round of work). Initial attributes at least:x
,y
,marker.color
,marker.size
(scatter, bar, histogram, box) Thenlat
,lon
(maps),a
,b
,c
(ternary), ‘z’ (scatter3d),error_y.array
- It would cover a set of (initially, non-WebGL) traces
- First goalpost is separation by category (JS number or string)
It is expected that the trace separation (and transformations in general) is being performed in the supply defaults step.
Subsequent goal: splitting data for animations
Instead of generating n
different paths as described above, plotly
would arrive at a temporal sequence of n
frames
Possible future items:
- Incremental recalculation (e.g. of bins, upon newly arriving data points)
- Combine this with a subplots transform for rendering the traces into separate subplots (as small multiples plots)
Issue Analytics
- State:
- Created 7 years ago
- Reactions:2
- Comments:10 (10 by maintainers)
Top GitHub Comments
closed in https://github.com/plotly/plotly.js/pull/936 and https://github.com/plotly/plotly.js/pull/978
Climb on!