question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

handle nan values in DataFrame.update when overwrite=False

See original GitHub issue

Code Sample

from pandas import DataFrame, date_range
df1 = DataFrame({'A': [1,None,3], 'B': date_range('2000', periods=3)})
df2 = DataFrame({'A': [None, 2, 3]})
df1.update(df2, overwrite=False)
df1

Problem description

I got TypeError: invalid type promotion error when updating a DF with a datetime column. The 2nd DF doesn’t have this column. The error message is in the details (although bad formatted).

IMHO, the culpit is in the DataFrame.update. The block checking mask.all should be outside the if block and applies to the case overwrite=False as well.

                if overwrite:
                    mask = isnull(that)

                    # don't overwrite columns unecessarily
                    if mask.all():
                        continue
                else:
                    mask = notnull(this)
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-70-23beb565ef70> in <module>() 1 df1 = DataFrame({'A': [1,None,3], 'B': date_range('2000', periods=3)}) 2 df2 = DataFrame({'A': [None, 2, 3]}) ----> 3 df1.update(df2, overwrite=False) 4 df1 5

C:\Users\pcluo\Anaconda3\lib\site-packages\pandas\core\frame.py in update(self, other, join, overwrite, filter_func, raise_conflict) 3845 3846 self[col] = expressions.where(mask, this, that, -> 3847 raise_on_error=True) 3848 3849 # ----------------------------------------------------------------------

C:\Users\pcluo\Anaconda3\lib\site-packages\pandas\computation\expressions.py in where(cond, a, b, raise_on_error, use_numexpr) 228 229 if use_numexpr: –> 230 return _where(cond, a, b, raise_on_error=raise_on_error) 231 return _where_standard(cond, a, b, raise_on_error=raise_on_error) 232

C:\Users\pcluo\Anaconda3\lib\site-packages\pandas\computation\expressions.py in _where_numexpr(cond, a, b, raise_on_error) 151 152 if result is None: –> 153 result = _where_standard(cond, a, b, raise_on_error) 154 155 return result

C:\Users\pcluo\Anaconda3\lib\site-packages\pandas\computation\expressions.py in _where_standard(cond, a, b, raise_on_error) 126 def _where_standard(cond, a, b, raise_on_error=True): 127 return np.where(_values_from_object(cond), _values_from_object(a), –> 128 _values_from_object(b)) 129 130

TypeError: invalid type promotion

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
pcluocommented, Mar 8, 2017

@mayukh18 just created a pull request. thx tho.

1reaction
jrebackcommented, Mar 6, 2017

yeah this looks like a bug. .update() has not gotten much TLC. in fact this should be completely changed, see #3025 to generally fix this method.

So a short-term fix is ok if you’d want to push that.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Updating a column with another column's values in python but ...
Hi I'm trying to use the .update() function to update the values of one column (B) with values from another column (C) in...
Read more >
Handling Missing Data in Pandas: NaN Values Explained
nan basically means undefined. Here make a dataframe with 3 columns and 3 rows. The array np.arange(1,4) is copied into each row.
Read more >
Pandas DataFrame | update method with Examples - SkyTowner
Pandas DataFrame.update(~) method replaces the values in the source DataFrame using non-NaN values from another DataFrame.
Read more >
How to Merge DataFrames in Pandas - Stack Abuse
In this tutorial, we will combine DataFrames in Pandas using the merge ... We can use df_second to patch missing values in df_first...
Read more >
Replace NaN Values with Zeros in Pandas DataFrame
NaN value is one of the major problems in Data Analysis. It is very essential to deal with NaN in order to get...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found