BUG: concat unwantedly sorts DataFrame column names if they differ
See original GitHub issueWhen concat’ing DataFrames, the column names get alphanumerically sorted if there are any differences between them. If they’re identical across DataFrames, they don’t get sorted. This sort is undocumented and unwanted. Certainly the default behavior should be no-sort. EDIT: the standard order as in SQL would be: columns from df1 (same order as in df1), columns (uniquely) from df2 (less the common columns) (same order as in df2). Example:
df4a = DataFrame(columns=['C','B','D','A'], data=np.random.randn(3,4))
df4b = DataFrame(columns=['C','B','D','A'], data=np.random.randn(3,4))
df5 = DataFrame(columns=['C','B','E','D','A'], data=np.random.randn(3,5))
print "Cols unsorted:", concat([df4a,df4b])
# Cols unsorted: C B D A
print "Cols sorted", concat([df4a,df5])
# Cols sorted A B C D E
``'
Issue Analytics
- State:
- Created 10 years ago
- Reactions:8
- Comments:36 (11 by maintainers)
Top Results From Across the Web
Sorting because non-concatenation axis is not aligned
When concat'ing DataFrames, the column names get alphanumerically sorted if there are any differences between them. If they're identical ...
Read more >What's new in 1.4.0 (January 22, 2022) - Pandas
concat () will preserve the attrs when it is the same for all objects and discard the attrs when they are different (GH41828)...
Read more >Renaming columns in a Pandas DataFrame | by B. Chen
In data analysis, we may work on a dataset that has no column names or column names contain some unwanted characters (e.g. space), ......
Read more >Excel: Merge tables by matching column data or headers
If you are to merge two tables based on one column, VLOOKUP is the right function to use. Supposing you have two tables...
Read more >pyspark get column type
The most pysparkish way to create a new column in a PySpark DataFrame is by using ... Whatever answers related to "pyspark get...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
This behavior is indeed quite unexpected and I also stumbled over it.
Naively one would expect that the order of columns is preserved. Instead the columns are sorted:
This can be corrected by reindexing with the original columns as follows:
Still it seems counter-intuitive that this automatic sorting takes place and cannot be disabled as far as I know.
Just stumbled upon this same issue when I was concatenating
DataFrames
. It’s a little bit annoying if you don’t know about this issue, but actually there is a quick remedy:say
dfs
is a list ofDataFrames
you want to concatenate, you can just take the the original column order and feed it back in: