question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: "cannot reindex from duplicate axis" thrown using unique indexes, duplicated column names and a specific numpy array values

See original GitHub issue

Code Sample

import pandas 
import numpy as np

a = np.array([[1,2],[3,4]]) 

# DO NOT WORKS
b = np.array([[0.5,6],[7,8]])  
# b = np.array([[.5,6],[7,8]])  # The same problem

# This one works fine:
# b = np.array([[5,6],[7,8]]) 

dfA = pandas.DataFrame(a)
# This works fine EVEN using .5, because the columns name is different
# dfA = pandas.DataFrame(a, columns=['a','b'])
dfB = pandas.DataFrame(b)

df_new = pandas.concat([dfA, dfB], axis = 1)

print(df_new[df_new > 5])

Problem description

It has a bug that combines numpy specific values and duplicated DataFrame column names when it’s used a select operation, such as df[df > 5]. A exception is thrown saying “cannot reindex from duplicate axis”, however It should not be, because:

  • The DataFrame has no duplicated indexes ( df.index.is_unique is True)
  • The DataFrame has duplicated column names, but should not be a problem when we apply the selection operation, such as df_new[df_new > 5]
  • The DataFrame uses float or int numpy values, so it should not change the behavior of the code

However the values in the numpy array DO changes the behavior of the DataFrame selection, if the DataFrame has duplicated column names.

Expected Output

    0   1    0  1
0 NaN NaN  NaN  6
1 NaN NaN  7.0  8

Current Output

~/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py in _can_reindex(self, indexer)
   3097         # trying to reindex on an axis with duplicates
   3098         if not self.is_unique and len(indexer):
-> 3099             raise ValueError("cannot reindex from a duplicate axis")
   3100 
   3101     def reindex(self, target, method=None, level=None, limit=None, tolerance=None):

ValueError: cannot reindex from a duplicate axis

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None python : 3.6.9.final.0 python-bits : 64 OS : Linux OS-release : 5.3.0-28-generic machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : pt_BR.UTF-8

pandas : 1.0.1 numpy : 1.18.1 pytz : 2019.3 dateutil : 2.8.1 pip : 20.0.2 setuptools : 45.2.0 Cython : None pytest : None hypothesis : None sphinx : 2.3.1 blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 2.11.1 IPython : 7.12.0 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : 3.1.3 numexpr : 2.7.1 odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None pytest : None pyxlsb : None s3fs : None scipy : 1.4.1 sqlalchemy : None tables : 3.6.1 tabulate : None xarray : None xlrd : None xlwt : None xlsxwriter : None numba : None

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:32 (17 by maintainers)

github_iconTop GitHub Comments

1reaction
MarcoGorellicommented, Oct 12, 2020

Yup 😄 @GabrielSimonetto if you wanted to submit a test to make sure this doesn’t break again in the future, that would be welcome!

1reaction
dsaxtoncommented, Sep 6, 2020

@MarcoGorelli I found that building instead with the command CFLAGS='-Wno-error=deprecated-declarations' python setup.py build_ext -i generally fixes things, although I’m not sure if it’ll work in this case. There’s a thread about these problems here: https://github.com/pandas-dev/pandas/issues/33315

Read more comments on GitHub >

github_iconTop Results From Across the Web

What does `ValueError: cannot reindex from a duplicate axis ...
This error usually rises when you join / assign to a column when the index has duplicate values. Since you are assigning to...
Read more >
Solve Pandas "ValueError: cannot reindex from a duplicate axis"
Apparently, the python error is the result of doing operations on a DataFrame that has duplicate index values. Operations that require unique ......
Read more >
Indexing and selecting data — pandas 1.0.1 documentation
The correct way to swap column values is by using raw values: ... In [17]: s.reindex(labels) ValueError: cannot reindex from a duplicate axis....
Read more >
ValueError: cannot reindex from a duplicate axis
When you get this error, first you have to just check if there is any duplication in your DataFrame column names using the...
Read more >
Python for Data Analysis, 3E - 5 Getting Started with pandas
Compared with NumPy arrays, you can use labels in the index when selecting ... for holding the axis labels (including a DataFrame's column...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found