question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Transform input data: groupby, filter

See original GitHub issue

Previously discussed (some lists are from @chriddyp) :

A groupbytransform should split apart traces as per unique values or bins of the groupby dimension. Example:

groupby: ['a', 'b', 'a', 'b']
x: [1, 2, 1, 2]
y: [10, 20, 30, 40]

should generate two traces:

trace 1:
x: [1, 2]
y: [10, 20]

trace 2:
x: [1, 2]
y: [30, 40]

Static groupby as a means of splitting spatially and/or aesthetically

  • distinct categorical values: numbers, strings or datetime strings
  • evenly spaced bins based on numerical data or time (datetime strings) in the groupby attribute, reusing logic of the preexisting plotly algorithm for histograms

image

Functional aspects:

  1. groupby needs to work across numbers, dates, and categories (@chriddyp in the JS context, meaning strings, correct?)

  2. groupby needs to split across all of the arrays or array-like specifications in a trace, not just x and y. For example, marker.color or marker.line.color. Not all array-like specifications in a trace are actual arrays (consider colorscale)

  3. There must be a way of specifying distinct styles for the split apart traces so that they’re discernible - example:

    transform:
        groupby: ['a', 'b', 'a', 'b']
        marker:
            color:
                a: 'blue'
                b: 'red'
    
  4. @etpinard found some issues with legend items as he wrote an initial version of transforms: https://github.com/plotly/plotly.js/pull/499#issuecomment-216597436. We’ll probably need to modify some of the transforms and API. That’s OK - transforms was made for groupby

  5. All relevant denotations for groupby, and the related animation split use (see below) need to be in the JSON format for serializability, fitting in the current declarative structure

  6. The transforms such as groupby must work in the restyle and relayout steps, not just the initial plot step

  7. gd.data is expected to preserve the single trace and the groupby spec as the user supplied, and _fullData on the other hand has the individual (spllt) traces and no longer has the groupby attribute

  8. We must ID traces in _fullData back to groups or styles in data. Styling controls will be populated with the defaults from _fullData (e.g. _fullData[4].marker.color) but they’ll need to update the attributes in the data object (e.g. data[0].transform.marker.color.d). That’s because we serialize and save data, not _fullData.

Preliminary work

Related PR, containing the initial, analogous filter work by @timelyportfolio : https://github.com/plotly/plotly.js/pull/859 groupby: https://github.com/plotly/plotly.js/blob/master/test/jasmine/assets/transforms/groupby.js

Planned groupby coverage of the initial sprint

  1. It would cover a positive list of attributes for groupby such as x and y but not all at once - HOWEVER the preferred solution aims for generality because other transforms will need to use a similar approach e.g. filter, and future arraylike attributes should be covered without code coupling to transformations (consequence: we’ll have to check if there’s enough attribute metadata that allows us to tell if it’s arraylike, or we need further metadata; also, whether there’s a programmatic way of separating arraylike data e.g. colorscale that’s not represented as an array at input, otherwise we need to handle them attribute by attribute (we’ll have to come back to this topic after a first round of work). Initial attributes at least: x, y, marker.color, marker.size (scatter, bar, histogram, box) Then lat, lon (maps), a, b, c (ternary), ‘z’ (scatter3d), error_y.array
  2. It would cover a set of (initially, non-WebGL) traces
  3. First goalpost is separation by category (JS number or string)

It is expected that the trace separation (and transformations in general) is being performed in the supply defaults step.

Subsequent goal: splitting data for animations

Instead of generating n different paths as described above, plotly would arrive at a temporal sequence of n frames

Possible future items:

  1. Incremental recalculation (e.g. of bins, upon newly arriving data points)
  2. Combine this with a subplots transform for rendering the traces into separate subplots (as small multiples plots)

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Reactions:2
  • Comments:10 (10 by maintainers)

github_iconTop GitHub Comments

2reactions
rreussercommented, Sep 16, 2016

Climb on! hand

Read more comments on GitHub >

github_iconTop Results From Across the Web

When to use aggreagate/filter/transform with pandas
When to use aggreagate/filter/transform with pandas. The pandas groupby method is a very powerful problem solving tool, but that power can ...
Read more >
Pandas Transform and Filter
In this blog we will see how to use Transform and filter on a groupby object. We all know about aggregate and apply...
Read more >
pandas.core.groupby.DataFrameGroupBy.transform
transform on a grouped DataFrame and the transformation function returns a DataFrame, currently pandas does not align the result's index with the input's...
Read more >
Advanced Use of groupby(), aggregate, filter, transform, apply
This is beginner Python Pandas tutorial #5 and in this video, we'll be diving into advanced use of groupby () method in pandas...
Read more >
python 3.x - Pandas filter, group-by and then transform
Check with get the condition apply to b then sum df['b'].eq(1).groupby(df['a']).transform('sum') Out[103]: 0 2.0 1 1.0 2 2.0 3 1.0 4 1.0 5 ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found