Compute forward difference with Dask DataFrame
See original GitHub issueI’m really excited about Dask. A huge THANK YOU to all the contributors 😃
I’d like to make a feature request for a new method: dask.DataFrame.diff()
. It would work like pandas.DataFrame.diff()
.
Mathematically, the operation is very simple: subtract a column vector from a copy of itself shifted by one or more rows.
I have tried implementing diff()
using Dask in the following ways, none of which works (yet):
df - df.shift(periods=1)
works in Pandas. But Dask DataFrame doesn’t have ashift()
method.df.values[:-1] - df.values[1:]
works in Pandas. But I can’t see how to index into a Dask DataFrame by position.
My current best idea for implementing diff
would be to wrap some custom code in dask.dataframe.rolling.wrap_rolling
, as suggested in this stack overflow answer (although I haven’t been able to find any documentation on how to do this). Or wrap some custom code using Dask Delayed? Any other thoughts?
Issue Analytics
- State:
- Created 7 years ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
Compute forward difference with Dask DataFrame?
How do I compute the first discrete difference using Dask DataFrame? Or, in "Pandas speak", how do I do pandas.DataFrame.diff() in Dask?
Read more >[Code]-Compute forward difference with Dask DataFrame?
How do I compute the first discrete difference using Dask DataFrame? Or, in "Pandas speak", how do I do pandas.DataFrame.diff() in Dask?
Read more >API - Dask documentation
Compute correlation with other Series, excluding missing values. ... Pseudorandomly split dataframe into different pieces row-wise.
Read more >Managing Computation - Dask.distributed
Managing Computation¶. Data and Computation in Dask.distributed are always in one of three states. Concrete values in local memory.
Read more >Best Practices - Dask documentation
It is ok to call dask.compute in the middle of your computation as well, but everything will stop there as Dask computes those...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
It would be nice to have a higher level function around rolling that accepted user defined functions, much in the same way we do for
reduction
today.Sure, I can take this on. Definitely agree that a general purpose function would be good here.