str.cat does not align on index?
See original GitHub issueThe implicit index-matching of pandas
for operations between different DataFrame
/Series
is great and most of the times, it just works. It does so consistently enough, that the expectation (for me) is that different Series will be aligned before an operation is performed.
For some reason, str.cat
does not seem to do so.
import pandas as pd # 0.21.0
import numpy as np # 1.13.3
col = pd.Series(['a','b','c','d','e','f','g','h','i','j'])
# choose random subsets
ss1 = [8, 1, 2, 0, 6] # list(col.sample(5).index)
ss2 = [4, 0, 9, 2, 6] # list(col.sample(5).index)
# perform str.cat
col.loc[ss1].str.cat(col.loc[ss2], sep = '').sort_index()
# 0 ac <-- UNMATCHED!
# 1 ba <-- UNMATCHED!
# 2 cj <-- UNMATCHED!
# 6 gg <-- correct by sheer luck
# 8 ie <-- UNMATCHED!
# compared for example with Boolean operations on unmatched series
# (matching indices and returning Series with union of both indices),
# this is inconsistent!
b = col.loc[ss1].astype(bool) & col.loc[ss2].astype(bool)
b
# 0 True
# 1 False
# 2 True
# 4 False
# 6 True
# 8 False
# 9 False
# if we manually align the Series
# (easy here by masking from the Series we just subsampled, hard in practice),
# then the NaNs are handled as expected:
m = col.where(np.isin(col.index, ss1)).str.cat(col.where(np.isin(col.index, ss2)), sep = '')
m
# 0 aa
# 1 NaN
# 2 cc
# 3 NaN
# 4 NaN
# 5 NaN
# 6 gg
# 7 NaN
# 8 NaN
# 9 NaN
# based on the normal pandas-behaviour for unmatched Series
# (for example as for Boolean "and" above), the following would be
# the expected result of col.loc[ss1].str.cat(col.loc[ss2], sep = '').sort_index() !
m.loc[b.index]
# 0 aa <-- MATCHED!
# 1 NaN
# 2 cc <-- MATCHED!
# 4 NaN
# 6 gg <-- MATCHED!
# 8 NaN
# 9 NaN
Issue Analytics
- State:
- Created 6 years ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
pandas.Series.str.cat — pandas 1.5.2 documentation
If others is specified, this function concatenates the Series/Index and elements of others element-wise. If others is not passed, then all values in...
Read more >Concatenating multiple pandas dataframes when columns are ...
I tried with pd.concat([df1,df2,df3]) with both axis=0 and axis=1 but none of them works as expected.
Read more >strcat
The functionality described on this reference page is aligned with the ISO C ... The strcat() function shall return s1; no return value...
Read more >xarray.concat
Index ) – Name of the dimension to concatenate along. ... “different”: Data variables which are not equal (ignoring attributes) across all datasets...
Read more >databricks.koalas.concat — Koalas 1.8.2 documentation
Sort non-concatenation axis if it is not already aligned. Returns. object, type of objs. When concatenating all Series along the index (axis=0) ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
align=None will let us detect whether or not the user explicitly wanted alignment. So it’s
if align is None: warnings.warn("A future version of pandas will perform alignment when others is a series. To disable alignment (the previous behavior) and silence this warning, pass ‘align=False’. To enable alignment (the future behavior) and silence this warning, pass ‘align=True’
On Wed, Dec 6, 2017 at 2:35 PM, h-vetinari notifications@github.com wrote:
@TomAugspurger @jreback This can now be closed.