Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Convert Dask dataframe back to Pandas

See original GitHub issue

Hi,

is there any way to convert a dask DataFrame back to Pandas? I have some features I need, which aren’t yet implemented in Dask. However I need parallel / partitioned mapping.

import dask.dataframe as dd
my_dask_ df = dd.from_pandas(df, npartitions=4)
my_dask_df.map_partitions(...)
del df

What I am looking for is:

df = my_dask_df.to_pandas()
del my_dask_df

Best, Marius

Issue Analytics

State:
Created 7 years ago
Comments:26 (10 by maintainers)

Top GitHub Comments

45reactions

TomAugspurgercommented, Oct 12, 2016

Hi Marius; Assuming your my_dask_df fits in memory, you should be able to do df = my_dask_df.compute().

3reactions

mrocklincommented, May 3, 2018

Then it won’t make sense to turn it into a Pandas dataframe, which needs to fit into memory.

You might instead consider writing it to disk in some efficient format, like Parquet.

Top Results From Across the Web

Converting a Dask DataFrame to a pandas ... - Coiled.io

Convert from Dask to pandas on localhost ... Start by creating a Dask DataFrame. All computations are in this notebook. ... Now convert...

How to transform Dask.DataFrame to pd.DataFrame?

Each partition in a Dask DataFrame is a Pandas DataFrame. Running df.compute() will coalesce all the underlying partitions in the Dask DataFrame ......

Dask DataFrame

A Dask DataFrame is a large parallel DataFrame composed of many smaller pandas DataFrames, split along the index. These pandas DataFrames may live...

dask.dataframe.Series.to_frame - Dask documentation

Convert Series to DataFrame. This docstring was copied from pandas.core.series.Series.to_frame. Some inconsistencies with the Dask version may exist.

Dask DataFrames Best Practices

For data that fits into RAM, pandas can often be faster and easier to use than Dask ... size result = result.compute() #...