question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

str.cat does not align on index?

See original GitHub issue

The implicit index-matching of pandas for operations between different DataFrame/Series is great and most of the times, it just works. It does so consistently enough, that the expectation (for me) is that different Series will be aligned before an operation is performed.

For some reason, str.cat does not seem to do so.

import pandas as pd # 0.21.0
import numpy as np # 1.13.3
col = pd.Series(['a','b','c','d','e','f','g','h','i','j'])

# choose random subsets
ss1 = [8, 1, 2, 0, 6] # list(col.sample(5).index) 
ss2 = [4, 0, 9, 2, 6] # list(col.sample(5).index)

# perform str.cat
col.loc[ss1].str.cat(col.loc[ss2], sep = '').sort_index()
# 0    ac <-- UNMATCHED!
# 1    ba <-- UNMATCHED!
# 2    cj <-- UNMATCHED!
# 6    gg <-- correct by sheer luck
# 8    ie <-- UNMATCHED!

# compared for example with Boolean operations on unmatched series
# (matching indices and returning Series with union of both indices),
# this is inconsistent!
b = col.loc[ss1].astype(bool) & col.loc[ss2].astype(bool)
b
# 0     True
# 1    False
# 2     True
# 4    False
# 6     True
# 8    False
# 9    False

# if we manually align the Series
# (easy here by masking from the Series we just subsampled, hard in practice),
# then the NaNs are handled as expected:
m = col.where(np.isin(col.index, ss1)).str.cat(col.where(np.isin(col.index, ss2)), sep = '')
m
# 0     aa
# 1    NaN
# 2     cc
# 3    NaN
# 4    NaN
# 5    NaN
# 6     gg
# 7    NaN
# 8    NaN
# 9    NaN

# based on the normal pandas-behaviour for unmatched Series
# (for example as for Boolean "and" above), the following would be
# the expected result of col.loc[ss1].str.cat(col.loc[ss2], sep = '').sort_index() !
m.loc[b.index]
# 0     aa <-- MATCHED!
# 1    NaN
# 2     cc <-- MATCHED!
# 4    NaN
# 6     gg <-- MATCHED!
# 8    NaN
# 9    NaN

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
TomAugspurgercommented, Dec 6, 2017

align=None will let us detect whether or not the user explicitly wanted alignment. So it’s

if align is None: warnings.warn("A future version of pandas will perform alignment when others is a series. To disable alignment (the previous behavior) and silence this warning, pass ‘align=False’. To enable alignment (the future behavior) and silence this warning, pass ‘align=True’

On Wed, Dec 6, 2017 at 2:35 PM, h-vetinari notifications@github.com wrote:

Cool! I think your proposal for dealing with the API-breaking sounds good, except maybe calling it align = False by default first, and changing to True at some point in the future (not unlike the expand-keyword that .str.extract gained in 0.18)

Looking at https://pandas-docs.github.io/pandas-docs-travis/text.html# method-summary, it seems that .str.cat is the only method that deals with two different Series. Maybe that’s why the alignment for it fell through the cracks?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/18657#issuecomment-349766662, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQHIqzEV4m0rYAwm7zFDhZQXZbrrteIks5s9vqjgaJpZM4Q3erA .

0reactions
h-vetinaricommented, May 3, 2018

@TomAugspurger @jreback This can now be closed.

Read more comments on GitHub >

github_iconTop Results From Across the Web

pandas.Series.str.cat — pandas 1.5.2 documentation
If others is specified, this function concatenates the Series/Index and elements of others element-wise. If others is not passed, then all values in...
Read more >
Concatenating multiple pandas dataframes when columns are ...
I tried with pd.concat([df1,df2,df3]) with both axis=0 and axis=1 but none of them works as expected.
Read more >
strcat
The functionality described on this reference page is aligned with the ISO C ... The strcat() function shall return s1; no return value...
Read more >
xarray.concat
Index ) – Name of the dimension to concatenate along. ... “different”: Data variables which are not equal (ignoring attributes) across all datasets...
Read more >
databricks.koalas.concat — Koalas 1.8.2 documentation
Sort non-concatenation axis if it is not already aligned. Returns. object, type of objs. When concatenating all Series along the index (axis=0) ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found