question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Compute forward difference with Dask DataFrame

See original GitHub issue

I’m really excited about Dask. A huge THANK YOU to all the contributors 😃

I’d like to make a feature request for a new method: dask.DataFrame.diff(). It would work like pandas.DataFrame.diff().

Mathematically, the operation is very simple: subtract a column vector from a copy of itself shifted by one or more rows.

I have tried implementing diff() using Dask in the following ways, none of which works (yet):

  • df - df.shift(periods=1) works in Pandas. But Dask DataFrame doesn’t have a shift() method.
  • df.values[:-1] - df.values[1:] works in Pandas. But I can’t see how to index into a Dask DataFrame by position.

My current best idea for implementing diff would be to wrap some custom code in dask.dataframe.rolling.wrap_rolling, as suggested in this stack overflow answer (although I haven’t been able to find any documentation on how to do this). Or wrap some custom code using Dask Delayed? Any other thoughts?

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

3reactions
mrocklincommented, Nov 8, 2016

It would be nice to have a higher level function around rolling that accepted user defined functions, much in the same way we do for reduction today.

2reactions
jcristcommented, Nov 8, 2016

Sure, I can take this on. Definitely agree that a general purpose function would be good here.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Compute forward difference with Dask DataFrame?
How do I compute the first discrete difference using Dask DataFrame? Or, in "Pandas speak", how do I do pandas.DataFrame.diff() in Dask?
Read more >
[Code]-Compute forward difference with Dask DataFrame?
How do I compute the first discrete difference using Dask DataFrame? Or, in "Pandas speak", how do I do pandas.DataFrame.diff() in Dask?
Read more >
API - Dask documentation
Compute correlation with other Series, excluding missing values. ... Pseudorandomly split dataframe into different pieces row-wise.
Read more >
Managing Computation - Dask.distributed
Managing Computation¶. Data and Computation in Dask.distributed are always in one of three states. Concrete values in local memory.
Read more >
Best Practices - Dask documentation
It is ok to call dask.compute in the middle of your computation as well, but everything will stop there as Dask computes those...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found