question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ENH: When chaining multiple .merge() functions, only the second "suffixes" param produces results

See original GitHub issue
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example


df =pd.DataFrame({'year':[2005, 2006], 'cusip':['111', '222']})
df_comp = pd.DataFrame({'year':[2005, 2006], 'cusip':['111', '222'], 'test': [5, 10]})

df_comp['year'] = df_comp['year'].astype(int)
df = df.merge(df_comp, how='left', on=['cusip', 'year'], indicator=True, suffixes=[None, '_wyear'])
print(df['_merge'].value_counts())
df = df.drop(columns='_merge')

df_comp2 = df_comp.drop(columns='year').drop_duplicates()
df = df.merge(df_comp2, how='left', on=['cusip'], indicator=True, suffixes=[None, '_woyear'])
print(df['_merge'].value_counts())
df = df.drop(columns='_merge')

print(df.columns)

Index([‘year’, ‘cusip’, ‘test’, ‘test_woyear’], dtype=‘object’)

Problem description

The suffixes parameter of _merge only produces a suffix for the second merge not the first. This behavior persists if I switch the two merges.

Expected Output

Index([‘year’, ‘cusip’, ‘test_wyear’, ‘test_woyear’], dtype=‘object’)

Output of pd.show_versions()

C:\Users\Hiwi_Tower\AppData\Local\Programs\Python\Python38\python.exe "W:/Lehrstuhlverzeichnis/300_Forschung/310_Aktuelle_Projekte/Paper Hauke/Python/Digital/getdigitalanalysts.py"

INSTALLED VERSIONS

commit : 2cb96529396d93b46abab7bbc73a208e708c642e python : 3.8.0.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19041 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : de_DE.UTF-8

pandas : 1.2.4 numpy : 1.18.1 pytz : 2019.3 dateutil : 2.8.1 pip : 20.2.4 setuptools : 41.2.0 Cython : 0.29.14 pytest : 6.2.3 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : 1.2.7 lxml.etree : 4.5.0 html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : 4.8.2 bottleneck : None fsspec : 2021.04.0 fastparquet : None gcsfs : None matplotlib : 3.4.1 numexpr : None odfpy : None openpyxl : 3.0.3 pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : 1.5.3 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : 1.2.0 xlwt : None numba : None

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
mzeitlin11commented, May 17, 2021

Will change the label, though -1 on the enhancement proposal. I’d imagine the more common use case for suffix is when you might have a small number of duplicate columns, but many distinct ones, so adding a suffix to those non-duplicate columns would change meaning unnecessarily. If a schema is well designed, the meaning of a column should not change when merging two data frames (https://en.wikipedia.org/wiki/Entity–relationship_model). What use case do you have where you’d like to always add suffixes?

(If you just want a convenient way of adding suffixes, there’s already https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.add_suffix.html)

1reaction
pratishvcommented, May 17, 2021

suffixes only works in case you have overlapping column names, in first merge you don’t have “test” column in df, thus there are no overlapping column names.

You can use a df.rename" if you actually want to have a different column name, after both the merges.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pandas merge unexpectedly produces suffixes - Stack Overflow
I am merging two Pandas DataFrames together and am getting "_x" and "_y" suffixes. Easy to replicate example below. I tried adding ,...
Read more >
Scalable String and Suffix Sorting - arXiv
This dissertation focuses on two fundamental sorting problems: string sorting and suffix sorting. The first part considers parallel string sorting on ...
Read more >
sox [global-options] [format-options] infile1
Un-merging is possible using multiple invocations of SoX with the remix ... The second parameter is a list of points on the compander's...
Read more >
Package 'data.table'
tables will return a character matrix if there are only atomic columns and any non-(numeric/logical/complex) column, applying as.vector to ...
Read more >
2022 Changelog | ClickHouse Docs
This changelog entry is added only to avoid confusion. ... The function generates pseudo random results with independent and identically distributed ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found