question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cannot reset index of dask.dataframe when underyling pandas Dataframes have multiIndex

See original GitHub issue

There appears to be an issue when resetting the index of a dask.dataframe after performing a dd.groupby when the resulting dask dataframe contains a multi index:

import pandas as pd
import numpy as np
import dask.dataframe as dd

df = pd.DataFrame(np.random.rand(2,3), columns=['A', 'B', 'C'])
ddf = dd.from_pandas(df, npartitions=1)
grouped = ddf.groupby(['A', 'B']).agg({'C': 'sum'})
grouped = grouped.reset_index()
grouped.compute()

Traceback:

Traceback (most recent call last):
  File "gist.py", line 9, in <module>
    grouped.compute()
  File "/Users/rwest/anaconda/envs/test_dask/lib/python2.7/site-packages/dask/base.py", line 95, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/Users/rwest/anaconda/envs/test_dask/lib/python2.7/site-packages/dask/base.py", line 202, in compute
    results = get(dsk, keys, **kwargs)
  File "/Users/rwest/anaconda/envs/test_dask/lib/python2.7/site-packages/dask/threaded.py", line 76, in get
    **kwargs)
  File "/Users/rwest/anaconda/envs/test_dask/lib/python2.7/site-packages/dask/async.py", line 500, in get_async
    raise(remote_exception(res, tb))
dask.async.ValueError: Length mismatch: Expected axis has 3 elements, new values have 2 elements

Traceback
---------
  File "/Users/rwest/anaconda/envs/test_dask/lib/python2.7/site-packages/dask/async.py", line 266, in execute_task
    result = _execute_task(task, data)
  File "/Users/rwest/anaconda/envs/test_dask/lib/python2.7/site-packages/dask/async.py", line 247, in _execute_task
    return func(*args2)
  File "/Users/rwest/anaconda/envs/test_dask/lib/python2.7/site-packages/dask/dataframe/core.py", line 3111, in apply_and_enforce
    return _rename(c, df)
  File "/Users/rwest/anaconda/envs/test_dask/lib/python2.7/site-packages/dask/dataframe/core.py", line 3148, in _rename
    df.columns = columns
  File "/Users/rwest/anaconda/envs/test_dask/lib/python2.7/site-packages/pandas/core/generic.py", line 2757, in __setattr__
    return object.__setattr__(self, name, value)
  File "pandas/src/properties.pyx", line 65, in pandas.lib.AxisProperty.__set__ (pandas/lib.c:46249)
  File "/Users/rwest/anaconda/envs/test_dask/lib/python2.7/site-packages/pandas/core/generic.py", line 448, in _set_axis
    self._data.set_axis(axis, labels)
  File "/Users/rwest/anaconda/envs/test_dask/lib/python2.7/site-packages/pandas/core/internals.py", line 2802, in set_axis
    (old_len, new_len))

Env: (conda create --name test_dask python=2.7.13 dask=0.14.1 pandas=0.19.2 numpy=1.12.1)

backports                 1.0                      py27_0  
backports_abc             0.5                      py27_0  
bokeh                     0.12.4                   py27_0  
chest                     0.2.3                    py27_0  
cloudpickle               0.2.2                    py27_0  
dask                      0.14.1                   py27_0  
futures                   3.0.5                    py27_0  
heapdict                  1.0.0                    py27_1  
jinja2                    2.9.6                    py27_0  
locket                    0.2.0                    py27_1  
markupsafe                0.23                     py27_2  
mkl                       2017.0.1                      0  
numpy                     1.12.1                   py27_0  
openssl                   1.0.2k                        1  
pandas                    0.19.2              np112py27_1  
partd                     0.3.7                    py27_0  
pip                       9.0.1                    py27_1  
python                    2.7.13                        0  
python-dateutil           2.6.0                    py27_0  
pytz                      2017.2                   py27_0  
pyyaml                    3.12                     py27_0  
readline                  6.2                           2  
requests                  2.13.0                   py27_0  
setuptools                27.2.0                   py27_0  
singledispatch            3.4.0.3                  py27_0  
six                       1.10.0                   py27_0  
sqlite                    3.13.0                        0  
ssl_match_hostname        3.4.0.2                  py27_1  
tk                        8.5.18                        0  
toolz                     0.8.2                    py27_0  
tornado                   4.5.1                    py27_0  
wheel                     0.29.0                   py27_0  
yaml                      0.1.6                         0  
zlib                      1.2.8                         3  

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:13 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
jcristcommented, May 1, 2017

Thanks for the bug report and excellent test case. I’ve found the issue, fix coming soon.

0reactions
phobsoncommented, Sep 8, 2022

I see – that might be tricky, but I’ll dig and ask around to see if that’s something we can put in the queue

Read more comments on GitHub >

github_iconTop Results From Across the Web

DataFrame.reset_index - Dask documentation
Reset the index to the default index. Note that unlike in pandas , the reset dask.dataframe index will not be monotonically increasing from...
Read more >
How to reset index on concatenated dataframe in Dask
I want to load data from multiple csv files and combine it into one Dask dataframe. in this example, there are 5 csv...
Read more >
What's New — pandas 0.23.4 documentation - PyData |
This enables merging DataFrame instances on a combination of index levels and columns without resetting indexes. See the Merge on columns and levels ......
Read more >
BasePandasDataset — Modin 0.11.0+0.gc3b8d7e.dirty ...
Add DataFrames. ... Change the data type of a DataFrame, including to boolean. ... While Index objects are copied when deep=True , the...
Read more >
Source code for pandas.core.frame
Similar to its R counterpart, data.frame, except providing automatic data alignment and a host of useful data manipulation methods having to do with...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found