Unable to change dtype of a column after concat and duplicate column names
See original GitHub issueCode Sample, a copy-pastable example if possible
import pandas
a = pandas.DataFrame({'a': ['0'], 'b': ['str']})
print(a.dtypes)
b = pandas.concat([a, pandas.DataFrame({'b': ['str2']})], axis=1)
print(b.dtypes)
b.iloc[:, 0] = [int(v) for v in a.iloc[:, 0]]
print(b.dtypes)
Problem description
The issue seems to be that if there is a DataFrame with duplicate column names, I cannot modify an (unrelated) column and convert its dtype from object to int.
Expected Output
The final print should be:
a int64
b object
b object
dtype: object
And not:
a object
b object
b object
dtype: object
It is interesting that if I change concat line to (see renaming of column b
to c
):
b = pandas.concat([a, pandas.DataFrame({'c': ['str2']})], axis=1)
Then the output is correctly:
a int64
b object
c object
dtype: object
So it seems it has something with the fact that there are duplicate column names on other two columns, not the one being changed.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None python: 3.6.3.final.0 python-bits: 64 OS: Linux OS-release: 4.13.0-46-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8
pandas: 0.23.3 pytest: None pip: 18.0 setuptools: 40.0.0 Cython: None numpy: 1.15.0 scipy: None pyarrow: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.7.3 pytz: 2018.5 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None sqlalchemy: None pymysql: None psycopg2: None jinja2: None s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None
Issue Analytics
- State:
- Created 5 years ago
- Comments:7 (6 by maintainers)
Top GitHub Comments
Hm not really, I think we would like something casting from object to numeric dtypes
would be great if you could provide a pr:)
Hey,
yes, for sure.