Cannot reset index of dask.dataframe when underyling pandas Dataframes have multiIndex
See original GitHub issueThere appears to be an issue when resetting the index of a dask.dataframe
after performing a dd.groupby
when the resulting dask dataframe contains a multi index:
import pandas as pd
import numpy as np
import dask.dataframe as dd
df = pd.DataFrame(np.random.rand(2,3), columns=['A', 'B', 'C'])
ddf = dd.from_pandas(df, npartitions=1)
grouped = ddf.groupby(['A', 'B']).agg({'C': 'sum'})
grouped = grouped.reset_index()
grouped.compute()
Traceback:
Traceback (most recent call last):
File "gist.py", line 9, in <module>
grouped.compute()
File "/Users/rwest/anaconda/envs/test_dask/lib/python2.7/site-packages/dask/base.py", line 95, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/Users/rwest/anaconda/envs/test_dask/lib/python2.7/site-packages/dask/base.py", line 202, in compute
results = get(dsk, keys, **kwargs)
File "/Users/rwest/anaconda/envs/test_dask/lib/python2.7/site-packages/dask/threaded.py", line 76, in get
**kwargs)
File "/Users/rwest/anaconda/envs/test_dask/lib/python2.7/site-packages/dask/async.py", line 500, in get_async
raise(remote_exception(res, tb))
dask.async.ValueError: Length mismatch: Expected axis has 3 elements, new values have 2 elements
Traceback
---------
File "/Users/rwest/anaconda/envs/test_dask/lib/python2.7/site-packages/dask/async.py", line 266, in execute_task
result = _execute_task(task, data)
File "/Users/rwest/anaconda/envs/test_dask/lib/python2.7/site-packages/dask/async.py", line 247, in _execute_task
return func(*args2)
File "/Users/rwest/anaconda/envs/test_dask/lib/python2.7/site-packages/dask/dataframe/core.py", line 3111, in apply_and_enforce
return _rename(c, df)
File "/Users/rwest/anaconda/envs/test_dask/lib/python2.7/site-packages/dask/dataframe/core.py", line 3148, in _rename
df.columns = columns
File "/Users/rwest/anaconda/envs/test_dask/lib/python2.7/site-packages/pandas/core/generic.py", line 2757, in __setattr__
return object.__setattr__(self, name, value)
File "pandas/src/properties.pyx", line 65, in pandas.lib.AxisProperty.__set__ (pandas/lib.c:46249)
File "/Users/rwest/anaconda/envs/test_dask/lib/python2.7/site-packages/pandas/core/generic.py", line 448, in _set_axis
self._data.set_axis(axis, labels)
File "/Users/rwest/anaconda/envs/test_dask/lib/python2.7/site-packages/pandas/core/internals.py", line 2802, in set_axis
(old_len, new_len))
Env: (conda create --name test_dask python=2.7.13 dask=0.14.1 pandas=0.19.2 numpy=1.12.1
)
backports 1.0 py27_0
backports_abc 0.5 py27_0
bokeh 0.12.4 py27_0
chest 0.2.3 py27_0
cloudpickle 0.2.2 py27_0
dask 0.14.1 py27_0
futures 3.0.5 py27_0
heapdict 1.0.0 py27_1
jinja2 2.9.6 py27_0
locket 0.2.0 py27_1
markupsafe 0.23 py27_2
mkl 2017.0.1 0
numpy 1.12.1 py27_0
openssl 1.0.2k 1
pandas 0.19.2 np112py27_1
partd 0.3.7 py27_0
pip 9.0.1 py27_1
python 2.7.13 0
python-dateutil 2.6.0 py27_0
pytz 2017.2 py27_0
pyyaml 3.12 py27_0
readline 6.2 2
requests 2.13.0 py27_0
setuptools 27.2.0 py27_0
singledispatch 3.4.0.3 py27_0
six 1.10.0 py27_0
sqlite 3.13.0 0
ssl_match_hostname 3.4.0.2 py27_1
tk 8.5.18 0
toolz 0.8.2 py27_0
tornado 4.5.1 py27_0
wheel 0.29.0 py27_0
yaml 0.1.6 0
zlib 1.2.8 3
Issue Analytics
- State:
- Created 6 years ago
- Comments:13 (7 by maintainers)
Top Results From Across the Web
DataFrame.reset_index - Dask documentation
Reset the index to the default index. Note that unlike in pandas , the reset dask.dataframe index will not be monotonically increasing from...
Read more >How to reset index on concatenated dataframe in Dask
I want to load data from multiple csv files and combine it into one Dask dataframe. in this example, there are 5 csv...
Read more >What's New — pandas 0.23.4 documentation - PyData |
This enables merging DataFrame instances on a combination of index levels and columns without resetting indexes. See the Merge on columns and levels ......
Read more >BasePandasDataset — Modin 0.11.0+0.gc3b8d7e.dirty ...
Add DataFrames. ... Change the data type of a DataFrame, including to boolean. ... While Index objects are copied when deep=True , the...
Read more >Source code for pandas.core.frame
Similar to its R counterpart, data.frame, except providing automatic data alignment and a host of useful data manipulation methods having to do with...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks for the bug report and excellent test case. I’ve found the issue, fix coming soon.
I see – that might be tricky, but I’ll dig and ask around to see if that’s something we can put in the queue