Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unable to change dtype of a column after concat and duplicate column names

See original GitHub issue

Code Sample, a copy-pastable example if possible

import pandas

a = pandas.DataFrame({'a': ['0'], 'b': ['str']})

print(a.dtypes)

b = pandas.concat([a, pandas.DataFrame({'b': ['str2']})], axis=1)

print(b.dtypes)

b.iloc[:, 0] = [int(v) for v in a.iloc[:, 0]]

print(b.dtypes)

Problem description

The issue seems to be that if there is a DataFrame with duplicate column names, I cannot modify an (unrelated) column and convert its dtype from object to int.

Expected Output

The final print should be:

a     int64
b    object
b    object
dtype: object

And not:

a    object
b    object
b    object
dtype: object

It is interesting that if I change concat line to (see renaming of column b to c):

b = pandas.concat([a, pandas.DataFrame({'c': ['str2']})], axis=1)

Then the output is correctly:

a     int64
b    object
c    object
dtype: object

So it seems it has something with the fact that there are duplicate column names on other two columns, not the one being changed.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None python: 3.6.3.final.0 python-bits: 64 OS: Linux OS-release: 4.13.0-46-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.23.3 pytest: None pip: 18.0 setuptools: 40.0.0 Cython: None numpy: 1.15.0 scipy: None pyarrow: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.7.3 pytz: 2018.5 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None sqlalchemy: None pymysql: None psycopg2: None jinja2: None s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None