question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Try streaming dataframes with cudf

See original GitHub issue

Ideally we would be able to reuse the streamz.dataframe module with pandas-like libraries, like cudf.

This would require removing explicit type checks like isinstance(obj, pd.DataFrame) with functional checks like "DataFrame" in type(obj).__name__" (or something better, like what is in dask.dataframe.utils.is_dataframe_like) and probably a bunch of other work.

However, if this works then ideally we would be able to reuse a bunch of code, and make streamz.dataframe a bit more generic.

To start, I would probably make a random dataframe of pandas dataframes, and then convert them into cudf dataframes

from streamz.dataframe import Random
import cudf

sdf = Random()
gdf = sdf.map_partitions(cudf.from_pandas, sdf)
gdf.x.sum()

Probably this will break for many reasons, both in streamz.dataframe and in cudf, but iterating on that set of problems might eventually yield something that is generic enough on both sides to work.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:1
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
mrocklincommented, Feb 6, 2019

Also cc @kkraus14, who I suspect will be interested in this

0reactions
martindurantcommented, Apr 28, 2019

Closed by #224

Read more comments on GitHub >

github_iconTop Results From Across the Web

Streaming GPU DataFrames (cudf) - Streamz - Read the Docs
This documentation is specific to streaming GPU dataframes using cudf. The example in the dataframes documentation is rewritten below using cudf dataframes just ......
Read more >
Getting Started with cuDF - GPU DataFrame Library (RAPIDS)
cuDF is a Python GPU DataFrame library (built on the Apache Arrow columnar memory format) for loading, joining, aggregating, filtering, ...
Read more >
Transitioning Between RAPIDS cuDF and CuPy Libraries
If we want to convert a cuDF DataFrame to a CuPy ndarray, There are multiple ways to do it: We can use the...
Read more >
Scalable Pandas Meetup No. 5: GPU Dataframe Library ...
We're really looking forward to our fifth Scalable Pandas Meetup! Pandas runs on CPUs, but with RAPIDS cuDF, you can get GPU speedups...
Read more >
How to speed up Pandas with cuDF? - GeeksforGeeks
cuDF (CUDA DF) is a Python GPU data frame library that helps ... In order to analyze the time taken in both cases,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found