Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Duplicated column name causes inconsistent ValueError during assignment to a unique column

See original GitHub issue

Code Sample, a copy-pastable example if possible

df1 = pd.DataFrame([[1,2,3,4]], columns=['C','D','D','a'])
df1['a'] = df1['a']

ValueError: Buffer has wrong number of dimensions (expected 1, got 0)

However, the following works without an error message:

df1 = pd.DataFrame([[1,2,3,4]], columns=['C','B','B','a'])
df1['a'] = df1['a']

Problem description

I believe ValueError should not be thrown in the first example. The issue appears to only occur when there is a duplicated column. Interestingly, there appears to be some link to the alphabetical order of the column names.

This appears similar to the issue reported in #21668.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None python: 3.7.0.final.0 python-bits: 64 OS: Linux OS-release: 4.14.76-1-lts machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.23.4 pytest: None pip: 18.0 setuptools: 40.4.3 Cython: None numpy: 1.15.2 scipy: 1.1.0 pyarrow: None xarray: None IPython: 6.5.0 sphinx: 1.8.0 patsy: 0.5.0 dateutil: 2.7.3 pytz: 2018.5 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 2.2.3 openpyxl: 2.5.7 xlrd: 1.1.0 xlwt: None xlsxwriter: None lxml: 4.2.5 bs4: None html5lib: 1.0.1 sqlalchemy: 1.2.12 pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Issue Analytics

State:
Created 5 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

ashammadcommented, Nov 14, 2020

I just built pandas-dev at my end, and seems the issue no longer exists pandas version: 1.2.0.dev0+1174.g4cfa97a18

0reactions

phoflcommented, Nov 10, 2020

Fix is on master, did not test other versions

Top Results From Across the Web

Apply forward to multiple columns in dataframe with the same ...

Duplicate column names are a problem if you plan to transfer your data set to another statistical language. They're also a problem because...

What's New — pandas 0.23.3 documentation - PyData |

Bug in DataFrame.agg() where applying multiple aggregation functions to a DataFrame with duplicated column names would cause a stack overflow (GH21063) ...

Duplicate columns in the metadata error - Azure Databricks

Cause. There are duplicate column names in the Delta table. Column names that differ only by case are considered duplicate. Delta Lake is...

Using Pandas and Python to Explore Your Dataset

Now you know that there are 126,314 rows and 23 columns in your dataset. ... Python data structures, you'll see that this behavior...

SQLSTATE values and common error codes - Db2 - IBM

An ambiguous qualified column name was resolved to the first of the duplicate names in the FROM clause. 01553, Isolation level RR conflicts...