Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

When adding a Series to a DataFrame with a different index, the Series gets turned into all NaNs

See original GitHub issue

Case in point:


>>> df
               RP/Rsum  P.value
ID                             
A_23_P42353    17.8     0      
A_23_P369994   15.91    0      
A_33_P3262440  436.7    0.0005 
A_32_P199429   18.97    0      
A_23_P256724   22.24    0      
A_33_P3394689  24.24    0      
A_33_P3403117  27.14    0      
A_24_P252364   28.56    0      
A_23_P99515    31.82    0      
A_24_P261750   31.46    0 

>>> df.dtypes
RP/Rsum    float64
P.value    float64

>>> ids = pandas.Series(['51513', '9201', np.nan, np.nan, '8794', '6530', '7025', '4897', '84935', '11081'])
>>> df["test"] = ids
>>> df
               RP/Rsum  P.value  test
ID                                   
A_23_P42353    17.8     0        NaN 
A_23_P369994   15.91    0        NaN 
A_33_P3262440  436.7    0.0005   NaN 
A_32_P199429   18.97    0        NaN 
A_23_P256724   22.24    0        NaN 
A_33_P3394689  24.24    0        NaN 
A_33_P3403117  27.14    0        NaN 
A_24_P252364   28.56    0        NaN 
A_23_P99515    31.82    0        NaN 
A_24_P261750   31.46    0        NaN 
>>> df.dtypes
RP/Rsum    float64
P.value    float64
test       object

This also happens with float objects and the like. I am not sure in what the trigger is.

Issue Analytics

State:
Created 12 years ago
Comments:9 (9 by maintainers)

Top GitHub Comments

6reactions

wesmcommented, Dec 6, 2011

The Series is given an implicit 0, …, N-1 index when you don’t supply one-- so this is exactly the behavior I would expect. If data were a raw ndarray or a list, then this would not occur. So the fact that when you do:

df[col] = series

and it conforms the series exactly to the index of df, that’s a feature and not a bug 😃 so

df['test'] = ids.values

would work fine in your example

0reactions

wesmcommented, Dec 8, 2011

Well, I think the basic idea is that DataFrame is a “fixed length dict-like container of Series”. When you construct a DataFrame with a dict of Series without an explicit index, there is no obvious index other than the union of them all.

I can see the argument for implicitly extending the index, but there are tradeoffs either way

Top Results From Across the Web

Wild NaNs appear when adding pandas Series as a column to ...

Possible ways of solving this issue include: some_pd_series.index = df.index; some_pd_series.reset_index(drop=True, inplace=True).

pandas.Series.reindex — pandas 1.5.2 documentation

Create a new index and reindex the dataframe. By default values in the new index that do not have corresponding records in the...

Creating a dataframe from Pandas series - GeeksforGeeks

But in Pandas Series we return an object in the form of list, having index starting from 0 to n, Where n is...

Pandas - Create DataFrame From Multiple Series

If you have a multiple series and wanted to create a pandas DataFrame by appending each series as a columns to DataFrame, you...

Timeseries — darts documentation

Timestamp (if Datetime-indexed) or into an integer (if Int64-indexed). has_same_time_as (other). Checks whether this series has the same time index as other ....