question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unable to change dtype of a column after concat and duplicate column names

See original GitHub issue

Code Sample, a copy-pastable example if possible

import pandas

a = pandas.DataFrame({'a': ['0'], 'b': ['str']})

print(a.dtypes)

b = pandas.concat([a, pandas.DataFrame({'b': ['str2']})], axis=1)

print(b.dtypes)

b.iloc[:, 0] = [int(v) for v in a.iloc[:, 0]]

print(b.dtypes)

Problem description

The issue seems to be that if there is a DataFrame with duplicate column names, I cannot modify an (unrelated) column and convert its dtype from object to int.

Expected Output

The final print should be:

a     int64
b    object
b    object
dtype: object

And not:

a    object
b    object
b    object
dtype: object

It is interesting that if I change concat line to (see renaming of column b to c):

b = pandas.concat([a, pandas.DataFrame({'c': ['str2']})], axis=1)

Then the output is correctly:

a     int64
b    object
c    object
dtype: object

So it seems it has something with the fact that there are duplicate column names on other two columns, not the one being changed.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None python: 3.6.3.final.0 python-bits: 64 OS: Linux OS-release: 4.13.0-46-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.23.3 pytest: None pip: 18.0 setuptools: 40.0.0 Cython: None numpy: 1.15.0 scipy: None pyarrow: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.7.3 pytz: 2018.5 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None sqlalchemy: None pymysql: None psycopg2: None jinja2: None s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:7 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
phoflcommented, Apr 25, 2021

Hm not really, I think we would like something casting from object to numeric dtypes

would be great if you could provide a pr:)

1reaction
phoflcommented, Dec 2, 2020

Hey,

yes, for sure.

Read more comments on GitHub >

github_iconTop Results From Across the Web

pandas : pd.concat results in duplicated columns
concat always concatenates the dataframes as is. It does not drop duplicate columns at all. If you concatenate without the axis, it will...
Read more >
Merge, join, concatenate and compare
You can merge a mult-indexed Series and a DataFrame, if the names of the MultiIndex correspond to the columns from the DataFrame. Transform...
Read more >
Merge columns (Power Query)
With Power Query, you can merge two or more columns in your query. You can merge columns to replace them with a merged...
Read more >
Pandas Change Column Names – 3 Methods
The second way to rename your columns is by setting DataFrame.columns to the list of your new column names. This will completely overwrite...
Read more >
Prevent duplicated columns when joining two Pandas ...
Column duplication usually occurs when the two data frames have columns with the same name and when the columns are not used in...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found