question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Convert Dask dataframe back to Pandas

See original GitHub issue

Hi,

is there any way to convert a dask DataFrame back to Pandas? I have some features I need, which aren’t yet implemented in Dask. However I need parallel / partitioned mapping.

import dask.dataframe as dd
my_dask_ df = dd.from_pandas(df, npartitions=4)
my_dask_df.map_partitions(...)
del df

What I am looking for is:

df = my_dask_df.to_pandas()
del my_dask_df

Best, Marius

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:26 (10 by maintainers)

github_iconTop GitHub Comments

45reactions
TomAugspurgercommented, Oct 12, 2016

Hi Marius; Assuming your my_dask_df fits in memory, you should be able to do df = my_dask_df.compute().

3reactions
mrocklincommented, May 3, 2018

Then it won’t make sense to turn it into a Pandas dataframe, which needs to fit into memory.

You might instead consider writing it to disk in some efficient format, like Parquet.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Converting a Dask DataFrame to a pandas ... - Coiled.io
Convert from Dask to pandas on localhost ... Start by creating a Dask DataFrame. All computations are in this notebook. ... Now convert...
Read more >
How to transform Dask.DataFrame to pd.DataFrame?
Each partition in a Dask DataFrame is a Pandas DataFrame. Running df.compute() will coalesce all the underlying partitions in the Dask DataFrame ......
Read more >
Dask DataFrame
A Dask DataFrame is a large parallel DataFrame composed of many smaller pandas DataFrames, split along the index. These pandas DataFrames may live...
Read more >
dask.dataframe.Series.to_frame - Dask documentation
Convert Series to DataFrame. This docstring was copied from pandas.core.series.Series.to_frame. Some inconsistencies with the Dask version may exist.
Read more >
Dask DataFrames Best Practices
For data that fits into RAM, pandas can often be faster and easier to use than Dask ... size result = result.compute() #...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found