question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: concat unwantedly sorts DataFrame column names if they differ

See original GitHub issue

When concat’ing DataFrames, the column names get alphanumerically sorted if there are any differences between them. If they’re identical across DataFrames, they don’t get sorted. This sort is undocumented and unwanted. Certainly the default behavior should be no-sort. EDIT: the standard order as in SQL would be: columns from df1 (same order as in df1), columns (uniquely) from df2 (less the common columns) (same order as in df2). Example:

df4a = DataFrame(columns=['C','B','D','A'], data=np.random.randn(3,4))
df4b = DataFrame(columns=['C','B','D','A'], data=np.random.randn(3,4))
df5  = DataFrame(columns=['C','B','E','D','A'], data=np.random.randn(3,5))

print "Cols unsorted:", concat([df4a,df4b])
# Cols unsorted:           C         B         D         A

print "Cols sorted", concat([df4a,df5])
# Cols sorted           A         B         C         D         E
``'

Issue Analytics

  • State:closed
  • Created 10 years ago
  • Reactions:8
  • Comments:36 (11 by maintainers)

github_iconTop GitHub Comments

21reactions
asteppkecommented, May 28, 2014

This behavior is indeed quite unexpected and I also stumbled over it.

 >>> df = pd.DataFrame()

>>> df['b'] = [1,2,3]
>>> df['c'] = [1,2,3]
>>> df['a'] = [1,2,3]
>>> print(df)
   b  c  a
0  1  1  1
1  2  2  2
2  3  3  3

[3 rows x 3 columns]
>>> df2 = pd.DataFrame({'a':[4,5]})
>>> df3 = pd.concat([df, df2])

Naively one would expect that the order of columns is preserved. Instead the columns are sorted:

>>> print(df3)
   a   b   c
0  1   1   1
1  2   2   2
2  3   3   3
0  4 NaN NaN
1  5 NaN NaN

[5 rows x 3 columns]

This can be corrected by reindexing with the original columns as follows:

>>> df4 = df3.reindex_axis(df.columns, axis=1)
>>> print(df4)
    b   c  a
0   1   1  1
1   2   2  2
2   3   3  3
0 NaN NaN  4
1 NaN NaN  5

[5 rows x 3 columns]

Still it seems counter-intuitive that this automatic sorting takes place and cannot be disabled as far as I know.

13reactions
rasbtcommented, Jan 13, 2015

Just stumbled upon this same issue when I was concatenating DataFrames. It’s a little bit annoying if you don’t know about this issue, but actually there is a quick remedy:

say dfs is a list of DataFrames you want to concatenate, you can just take the the original column order and feed it back in:

df = pd.concat(dfs, axis=0)
df = df[dfs[0].columns]
Read more comments on GitHub >

github_iconTop Results From Across the Web

Sorting because non-concatenation axis is not aligned
When concat'ing DataFrames, the column names get alphanumerically sorted if there are any differences between them. If they're identical ...
Read more >
What's new in 1.4.0 (January 22, 2022) - Pandas
concat () will preserve the attrs when it is the same for all objects and discard the attrs when they are different (GH41828)...
Read more >
Renaming columns in a Pandas DataFrame | by B. Chen
In data analysis, we may work on a dataset that has no column names or column names contain some unwanted characters (e.g. space), ......
Read more >
Excel: Merge tables by matching column data or headers
If you are to merge two tables based on one column, VLOOKUP is the right function to use. Supposing you have two tables...
Read more >
pyspark get column type
The most pysparkish way to create a new column in a PySpark DataFrame is by using ... Whatever answers related to "pyspark get...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found