question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Index dtype changes when assigning column from index

See original GitHub issue
  • Creating a new ddf column from the ddf’s existing index erroneously changes the dtype of the index
  • Impact: causes downstream ddf operations to fail or behave in unintended ways, e.g. dd.concat
>>> ddf = dd.from_pandas(pd.DataFrame(dict(i=['a','b'], a=[1,2])), npartitions=1)
>>> ddf = ddf.set_index('i)

>>> ddf.index

Dask Index Structure:
npartitions=1
a    object
b       ...
Name: i, dtype: object
Dask Name: sort_index, 4 tasks

>>> ddf.assign(b=ddf.index).index

Dask Index Structure:
npartitions=1
a    int64
b      ...
dtype: int64
Dask Name: assign, 6 tasks

Versions

$ python --version
Python 3.6.0

$ cat requirements.txt | egrep 'dask/dask' | grep -v '^#'
git+https://github.com/dask/dask.git@00ed56e

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:2
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
jcristcommented, Apr 11, 2019

This is fixed by #4695:

In [1]: %load test.py

In [2]: # %load test.py
   ...: import pandas as pd
   ...: import dask.dataframe as dd
   ...:
   ...: df = pd.DataFrame(dict(i=['a','b'], a=[1,2]))
   ...: ddf = dd.from_pandas(df, npartitions=1).set_index('i')
   ...: ddf2 = ddf.assign(b=ddf.index)
   ...: assert ddf2.index.dtype == object
   ...:

In [3]: %load test2.py

In [4]: # %load test2.py
   ...: import pandas as pd
   ...: import dask.dataframe as dd
   ...:
   ...: df = pd.DataFrame([dict(x='c', y='a'), dict(x='d', y='b')])
   ...: ddf = dd.from_pandas(df, 2)
   ...: ddf2 = ddf.rename(columns=dict(x='index'))
   ...: ddf3 = ddf2.groupby('index')['y'].apply('-'.join, meta=str)
   ...: ddf4 = ddf3.reset_index()
   ...: assert ddf4['index'].dtype == 'int64'
   ...:
0reactions
jdanbrowncommented, May 6, 2019

@jcrist Thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Assign existing column to the DataFrame index with set_index()
By using set_index() , you can assign an existing column of pandas.DataFrame to index (row label). Setting unique names for index makes it...
Read more >
pandas.DataFrame.set_index — pandas 1.5.2 documentation
Set the DataFrame index (row labels) using one or more existing columns or arrays (of the correct length). ... Change to same indices...
Read more >
Change Pandas index datatype on MultiIndex - Stack Overflow
I have a Pandas dataframe with two indexes
Read more >
Pandas Set Index to Column in DataFrame
One simple way to set an index to a column is by assigning an index as a new column to pandas DataFrame. DataFrame.index...
Read more >
Python | Pandas Index.astype() - GeeksforGeeks
Example #1: Use Index.astype() function to change the data type of index from float to integer type.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found