dataframe.fillna with df fails in a specific case
See original GitHub issueThe following code replace NaN values from a dataframe and works perfectly:
import pandas as pd
import numpy as np
df = pd.DataFrame({'key': ['01', '01', '01', '03', '04', '05'], 'A': [np.nan, 'A1', 'A2', 'A3', np.nan, np.nan], 'B': [1, 2, 3, np.nan, 5, 6]})
df2 = pd.DataFrame({'key': ['01', '03', '04', '05', '08', '99'], 'A': ['OK1', 'KO3', 'OK4', 'OK5', 'KO8', 'K99'], 'B': [91, 92, 93, 94, 95, 12]})
df.set_index('key').fillna(df2.set_index('key')).reset_index()
We obtain:
df
key A B
0 01 NaN 1.0
1 01 A1 2.0
2 01 A2 3.0
3 03 A3 NaN
4 04 NaN 5.0
5 05 NaN 6.0
df2
key A B
0 01 OK1 91
1 03 KO3 92
2 04 OK4 93
3 05 OK5 94
4 08 KO8 95
5 99 K99 12
res
key A B
0 01 OK1 1.0
1 01 A1 2.0
2 01 A2 3.0
3 03 A3 92.0
4 04 OK4 5.0
5 05 OK5 6.0
However, the following minor change breaks everything for no apparent reason. When computing df3, we obtain an InvalidIndexError:
df.at[3, 'key'] = '99'
df_res = df.set_index('key').fillna(df2.set_index('key')).reset_index()
Here is the updated dataframe.
df
key A B
0 01 NaN 1.0
1 01 A1 2.0
2 01 A2 3.0
3 99 A3 NaN
4 04 NaN 5.0
5 05 NaN 6.0
Issue Analytics
- State:
- Created 4 years ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
pandas fillna not working - Stack Overflow
In the case where you are using a DataFrame, you can use DataFrame.where to use another frame's values to replace the values when...
Read more >Pandas Fillna - Dealing with Missing Values - Datagy
The Pandas FillNa function allows you to fill missing values, with specifc values, previous values (back fill), and other computed values.
Read more >pandas: Replace missing values (NaN) with fillna() - nkmk note
You can replace the missing value (NaN) in pandas.DataFrame and Series with any value using the fillna() method.pandas.
Read more >Working with missing data — pandas 1.5.2 documentation
For object containers, pandas will use the value given: ... -0.173215 e NaN NaN NaN f NaN NaN NaN h NaN -0.706771 -1.039575...
Read more >Pandas DataFrame fillna() Method - W3Schools
The fillna() method replaces the NULL values with a specified value. The fillna() method returns a new DataFrame object unless the inplace parameter...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hi everyone,
The concerns raised by @andreapiso are important but don’t relate to the issue pointed by @remidomingues.
This issue is related to the fact that the dataframe with missing values needs to have an increasing index if it contains repeated values. See the following example:
Raising
So a quick fix is to sort the index before .fillna but might be nice to receive a more informative error message.
Thanks a lot guys!
I do have a similar case for dataframes with a multi-index and the points made by @TomAugspurger and @mbataillou do not apply here, I think. I also asked it here.
Working:
will give
So, the indexes are not monotonic and there are entries that cannot be matched; nevertheless it works fine.
A very similar case, however, fails:
raises the
InvalidIndexError
To me, both cases look identical, so I have no idea how to check whether I work with a valid or invalid index. Any ideas?