question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Please add force_suffixes to pandas.merge()

See original GitHub issue

Code Sample, a copy-pastable example if possible

import pandas as pd

A = pd.DataFrame({'colname': [1, 2]})

B = pd.DataFrame({'colname': [1, 2]})

C = pd.DataFrame({'colname': [1, 2]})

D = pd.merge(A, B, right_index=True, left_index=True, suffixes=('_A', '_B'))

print(pd.merge(D, B, right_index=True, left_index=True, suffixes=('', '_C'))

>   colname_A  colname_B  colname
> 0          1          1        1
> 1          2          2        2

Problem description

When using pandas.merge() suffixes are only added to the column-names if a column-name appears in both data frames. With more than two data frames that can lead to the above situation. The first merge adds suffixes. When the resulting table is merged with a third table the column-name does not appear twice and the suffix is not added. The suffix then needs to be added manually. This is overhead, especially when larger numbers of data-frames are merged. An option “force_suffixes” would be appreciated that ensures the suffix is added.

Expected Output

>   colname_A  colname_B  colname_C
> 0          1          1        1
> 1          2          2        2

Issue Analytics

  • State:open
  • Created 6 years ago
  • Reactions:25
  • Comments:21 (6 by maintainers)

github_iconTop GitHub Comments

11reactions
TheRealPJCcommented, Mar 14, 2019

I agree - this very useful.

1reaction
ThomasProctorcommented, Oct 31, 2017

Here’s the kludge I’ve been using to do this. Maybe it will help with a fix. I make no claims to it being remotely well-engineered

def merge_force_suffix(left, right, **kwargs):
    on_col = kwargs['on']
    suffix_tupple = kwargs['suffixes']
    
    def suffix_col(col, suffix):
        if col != on_col:
            return str(col) + suffix
        else:
            return col
            
    left_suffixed = left.rename(columns=lambda x: suffix_col(x, suffix_tupple[0]))
    right_suffixed = right.rename(columns=lambda x: suffix_col(x, suffix_tupple[1]))
    del kwargs['suffixes']
    return pd.merge(left_suffixed, right_suffixed, **kwargs)
Read more comments on GitHub >

github_iconTop Results From Across the Web

pandas join DataFrame force suffix? - Stack Overflow
As of pandas version 0.24.2 you can add a suffix to column names on a DataFrame using the add_suffix method. This makes a...
Read more >
pandas.DataFrame.merge — pandas 1.5.2 documentation
Merge DataFrame or named Series objects with a database-style join. ... If True, adds a column to the output DataFrame called “_merge” with...
Read more >
Combining Data in Pandas With merge(), .join(), and concat()
pandas merge() : Combining Data on Common Columns or Indices ... suffixes is a tuple of strings to append to identical column names...
Read more >
Pandas merge() - Merging Two DataFrame Objects
If you have any suggestions for improvements, please let us know by ... Pandas DataFrame merge() function is used to merge two DataFrame ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found