Try streaming dataframes with cudf
See original GitHub issueIdeally we would be able to reuse the streamz.dataframe
module with pandas-like libraries, like cudf
.
This would require removing explicit type checks like isinstance(obj, pd.DataFrame)
with functional checks like "DataFrame" in type(obj).__name__"
(or something better, like what is in dask.dataframe.utils.is_dataframe_like
) and probably a bunch of other work.
However, if this works then ideally we would be able to reuse a bunch of code, and make streamz.dataframe
a bit more generic.
To start, I would probably make a random dataframe of pandas dataframes, and then convert them into cudf dataframes
from streamz.dataframe import Random
import cudf
sdf = Random()
gdf = sdf.map_partitions(cudf.from_pandas, sdf)
gdf.x.sum()
Probably this will break for many reasons, both in streamz.dataframe and in cudf, but iterating on that set of problems might eventually yield something that is generic enough on both sides to work.
Issue Analytics
- State:
- Created 5 years ago
- Reactions:1
- Comments:5 (2 by maintainers)
Top GitHub Comments
Also cc @kkraus14, who I suspect will be interested in this
Closed by #224