question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

pd.concat does not work correctly

See original GitHub issue

Code Sample, a copy-pastable example if possible

df1.shape    # (21141, 59)
df2.shape    # (21141, 6)
result = pd.concat([df1, df2], axis=1, ignore_index=True)
result.shape    # (42282, 65)

Problem description

I have 2 dataframes that I try to concatenate horizontally. The method concat doesn’t work: it returns a dataframe with a wrong dimension. Moreover, all column names happen to be changed to numbers going from 0 to 64…

The dataframes are created from a dataset that is a bit big so I cannot reproduce the creation code here but I can provide you with more details by e-mail.

Expected Output

The right dimension should be (21141, 65) and the resulting columns should be just the concatenation of df1’s columns and df2’s columns.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.4.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel byteorder: little LC_ALL: None LANG: fr LOCALE: None.None

pandas: 0.22.0 pytest: 3.3.2 pip: 18.0 setuptools: 39.0.1 Cython: 0.27.3 numpy: 1.14.2 scipy: 1.0.0 pyarrow: None xarray: None IPython: 6.2.1 sphinx: 1.6.6 patsy: 0.5.0 dateutil: 2.6.1 pytz: 2017.3 blosc: None bottleneck: 1.2.1 tables: 3.4.2 numexpr: 2.6.4 feather: None matplotlib: 2.1.2 openpyxl: 2.4.10 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 1.0.2 lxml: 4.1.1 bs4: 4.6.0 html5lib: 1.0.1 sqlalchemy: 1.2.1 pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:8 (1 by maintainers)

github_iconTop GitHub Comments

12reactions
TomAugspurgercommented, Feb 17, 2019

It’s hard to say without a minimal example, but it appears that you’re getting confused by the alignment. See http://pandas-docs.github.io/pandas-docs-travis/reference/api/pandas.concat.html?highlight=concat#pandas.concat

Specifically, the ignore_index_parameter

Note the index values on the other axes are still respected in the join.

Since you’re using axis=1,

  1. the column labels will be reset to [0, n)
  2. the row labels will be preserved (and aligned).

If you really don’t care about your row labels, then you’ll want to drop the row labels before concating pd.concat([df1.reset_index(drop=True), df2.reset_index(drop=True)], ...)

4reactions
mfouesneaucommented, Feb 21, 2020

I agree with @Mark531 there should be an intuitive manner to merge dataframes horizontally. the documentation on ignore_index=True is unclear, I also spent time on this.

Read more comments on GitHub >

github_iconTop Results From Across the Web

python - Concat not working as expected - Stack Overflow
I want to combine this 2 into 1 file. My approach push the data into dataframe and used concat on them get the...
Read more >
Merge, join, and concatenate — pandas 0.24.2 documentation
Concatenating objects¶ ... The concat() function (in the main pandas namespace) does all of the heavy lifting of performing concatenation operations along an...
Read more >
Pandas concat() tricks you should know to speed up your data ...
In this article, you'll learn Pandas concat() tricks to deal with the following common problems: Dealing with index and axis; Avoiding duplicate ...
Read more >
Combining Datasets: Concat and Append
These operations can involve anything from very straightforward concatenation of two different datasets, to more complicated database-style joins and merges ...
Read more >
Combining Data in Pandas With merge(), .join(), and concat()
If you don't specify the merge column(s) with on , then pandas will use any ... With this join, all rows from the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found