Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ENH: When chaining multiple .merge() functions, only the second "suffixes" param produces results

See original GitHub issue

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example


df =pd.DataFrame({'year':[2005, 2006], 'cusip':['111', '222']})
df_comp = pd.DataFrame({'year':[2005, 2006], 'cusip':['111', '222'], 'test': [5, 10]})

df_comp['year'] = df_comp['year'].astype(int)
df = df.merge(df_comp, how='left', on=['cusip', 'year'], indicator=True, suffixes=[None, '_wyear'])
print(df['_merge'].value_counts())
df = df.drop(columns='_merge')

df_comp2 = df_comp.drop(columns='year').drop_duplicates()
df = df.merge(df_comp2, how='left', on=['cusip'], indicator=True, suffixes=[None, '_woyear'])
print(df['_merge'].value_counts())
df = df.drop(columns='_merge')

print(df.columns)

Index([‘year’, ‘cusip’, ‘test’, ‘test_woyear’], dtype=‘object’)

Problem description

The suffixes parameter of _merge only produces a suffix for the second merge not the first. This behavior persists if I switch the two merges.

Expected Output

Index([‘year’, ‘cusip’, ‘test_wyear’, ‘test_woyear’], dtype=‘object’)

Output of `pd.show_versions()`

C:\Users\Hiwi_Tower\AppData\Local\Programs\Python\Python38\python.exe "W:/Lehrstuhlverzeichnis/300_Forschung/310_Aktuelle_Projekte/Paper Hauke/Python/Digital/getdigitalanalysts.py"

INSTALLED VERSIONS

commit : 2cb96529396d93b46abab7bbc73a208e708c642e python : 3.8.0.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19041 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : de_DE.UTF-8

pandas : 1.2.4 numpy : 1.18.1 pytz : 2019.3 dateutil : 2.8.1 pip : 20.2.4 setuptools : 41.2.0 Cython : 0.29.14 pytest : 6.2.3 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : 1.2.7 lxml.etree : 4.5.0 html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : 4.8.2 bottleneck : None fsspec : 2021.04.0 fastparquet : None gcsfs : None matplotlib : 3.4.1 numexpr : None odfpy : None openpyxl : 3.0.3 pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : 1.5.3 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : 1.2.0 xlwt : None numba : None

Issue Analytics

State:
Created 2 years ago
Comments:7 (3 by maintainers)

Top GitHub Comments

1reaction

mzeitlin11commented, May 17, 2021

Will change the label, though -1 on the enhancement proposal. I’d imagine the more common use case for suffix is when you might have a small number of duplicate columns, but many distinct ones, so adding a suffix to those non-duplicate columns would change meaning unnecessarily. If a schema is well designed, the meaning of a column should not change when merging two data frames (https://en.wikipedia.org/wiki/Entity–relationship_model). What use case do you have where you’d like to always add suffixes?

(If you just want a convenient way of adding suffixes, there’s already https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.add_suffix.html)

1reaction

pratishvcommented, May 17, 2021

suffixes only works in case you have overlapping column names, in first merge you don’t have “test” column in df, thus there are no overlapping column names.

You can use a df.rename" if you actually want to have a different column name, after both the merges.

Top Results From Across the Web

Pandas merge unexpectedly produces suffixes - Stack Overflow

I am merging two Pandas DataFrames together and am getting "_x" and "_y" suffixes. Easy to replicate example below. I tried adding ,...

Scalable String and Suffix Sorting - arXiv

This dissertation focuses on two fundamental sorting problems: string sorting and suffix sorting. The first part considers parallel string sorting on ...

sox [global-options] [format-options] infile1

Un-merging is possible using multiple invocations of SoX with the remix ... The second parameter is a list of points on the compander's...

Package 'data.table'

tables will return a character matrix if there are only atomic columns and any non-(numeric/logical/complex) column, applying as.vector to ...

2022 Changelog | ClickHouse Docs

This changelog entry is added only to avoid confusion. ... The function generates pseudo random results with independent and identically distributed ...