Convert Dask dataframe back to Pandas
See original GitHub issueHi,
is there any way to convert a dask DataFrame back to Pandas? I have some features I need, which aren’t yet implemented in Dask. However I need parallel / partitioned mapping.
import dask.dataframe as dd
my_dask_ df = dd.from_pandas(df, npartitions=4)
my_dask_df.map_partitions(...)
del df
What I am looking for is:
df = my_dask_df.to_pandas()
del my_dask_df
Best, Marius
Issue Analytics
- State:
- Created 7 years ago
- Comments:26 (10 by maintainers)
Top Results From Across the Web
Converting a Dask DataFrame to a pandas ... - Coiled.io
Convert from Dask to pandas on localhost ... Start by creating a Dask DataFrame. All computations are in this notebook. ... Now convert...
Read more >How to transform Dask.DataFrame to pd.DataFrame?
Each partition in a Dask DataFrame is a Pandas DataFrame. Running df.compute() will coalesce all the underlying partitions in the Dask DataFrame ......
Read more >Dask DataFrame
A Dask DataFrame is a large parallel DataFrame composed of many smaller pandas DataFrames, split along the index. These pandas DataFrames may live...
Read more >dask.dataframe.Series.to_frame - Dask documentation
Convert Series to DataFrame. This docstring was copied from pandas.core.series.Series.to_frame. Some inconsistencies with the Dask version may exist.
Read more >Dask DataFrames Best Practices
For data that fits into RAM, pandas can often be faster and easier to use than Dask ... size result = result.compute() #...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi Marius; Assuming your
my_dask_df
fits in memory, you should be able to dodf = my_dask_df.compute()
.Then it won’t make sense to turn it into a Pandas dataframe, which needs to fit into memory.
You might instead consider writing it to disk in some efficient format, like Parquet.