question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Duplicated column name causes inconsistent ValueError during assignment to a unique column

See original GitHub issue

Code Sample, a copy-pastable example if possible

df1 = pd.DataFrame([[1,2,3,4]], columns=['C','D','D','a'])
df1['a'] = df1['a']

ValueError: Buffer has wrong number of dimensions (expected 1, got 0)

However, the following works without an error message:

df1 = pd.DataFrame([[1,2,3,4]], columns=['C','B','B','a'])
df1['a'] = df1['a']

Problem description

I believe ValueError should not be thrown in the first example. The issue appears to only occur when there is a duplicated column. Interestingly, there appears to be some link to the alphabetical order of the column names.

This appears similar to the issue reported in #21668.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None python: 3.7.0.final.0 python-bits: 64 OS: Linux OS-release: 4.14.76-1-lts machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.23.4 pytest: None pip: 18.0 setuptools: 40.4.3 Cython: None numpy: 1.15.2 scipy: 1.1.0 pyarrow: None xarray: None IPython: 6.5.0 sphinx: 1.8.0 patsy: 0.5.0 dateutil: 2.7.3 pytz: 2018.5 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 2.2.3 openpyxl: 2.5.7 xlrd: 1.1.0 xlwt: None xlsxwriter: None lxml: 4.2.5 bs4: None html5lib: 1.0.1 sqlalchemy: 1.2.12 pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
ashammadcommented, Nov 14, 2020

I just built pandas-dev at my end, and seems the issue no longer exists pandas version: 1.2.0.dev0+1174.g4cfa97a18

0reactions
phoflcommented, Nov 10, 2020

Fix is on master, did not test other versions

Read more comments on GitHub >

github_iconTop Results From Across the Web

Apply forward to multiple columns in dataframe with the same ...
Duplicate column names are a problem if you plan to transfer your data set to another statistical language. They're also a problem because...
Read more >
What's New — pandas 0.23.3 documentation - PyData |
Bug in DataFrame.agg() where applying multiple aggregation functions to a DataFrame with duplicated column names would cause a stack overflow (GH21063) ...
Read more >
Duplicate columns in the metadata error - Azure Databricks
Cause. There are duplicate column names in the Delta table. Column names that differ only by case are considered duplicate. Delta Lake is...
Read more >
Using Pandas and Python to Explore Your Dataset
Now you know that there are 126,314 rows and 23 columns in your dataset. ... Python data structures, you'll see that this behavior...
Read more >
SQLSTATE values and common error codes - Db2 - IBM
An ambiguous qualified column name was resolved to the first of the duplicate names in the FROM clause. 01553, Isolation level RR conflicts...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found