DataFrame.update crashes with overwrite=False when NaT present
See original GitHub issueCode Sample
df1 = DataFrame({'A': [1,None], 'B':[to_datetime('abc', errors='coerce'),to_datetime('2016-01-01')]})
df2 = DataFrame({'A': [2,3]})
df1.update(df2, overwrite=False)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-5-a766b5317aac> in <module>()
1 df1 = DataFrame({'A': [1,None], 'B':[to_datetime('abc', errors='coerce'),to_datetime('2016-01-01')]})
2 df2 = DataFrame({'A': [2,3]})
----> 3 df1.update(df2, overwrite=False)
~/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/frame.py in update(self, other, join, overwrite, filter_func, raise_conflict)
3897
3898 self[col] = expressions.where(mask, this, that,
-> 3899 raise_on_error=True)
3900
3901 # ----------------------------------------------------------------------
~/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/computation/expressions.py in where(cond, a, b, raise_on_error, use_numexpr)
229
230 if use_numexpr:
--> 231 return _where(cond, a, b, raise_on_error=raise_on_error)
232 return _where_standard(cond, a, b, raise_on_error=raise_on_error)
233
~/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/computation/expressions.py in _where_numexpr(cond, a, b, raise_on_error)
152
153 if result is None:
--> 154 result = _where_standard(cond, a, b, raise_on_error)
155
156 return result
~/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/computation/expressions.py in _where_standard(cond, a, b, raise_on_error)
127 def _where_standard(cond, a, b, raise_on_error=True):
128 return np.where(_values_from_object(cond), _values_from_object(a),
--> 129 _values_from_object(b))
130
131
TypeError: invalid type promotion
Problem description
A similar problem as in issue #15593 which was fixed in pandas version 0.20.2, NaT values anywhere in the DataFrame still throws the following exception: TypeError: invalid type promotion
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Darwin
OS-release: 16.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.20.2 pytest: 2.9.2 pip: 9.0.1 setuptools: 36.0.1 Cython: 0.24 numpy: 1.13.0 scipy: 0.17.1 xarray: None IPython: 6.1.0 sphinx: 1.4.1 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: 1.1.0 tables: 3.4.2 numexpr: 2.6.2 feather: 0.3.1 matplotlib: 1.5.1 openpyxl: 2.4.0 xlrd: 1.0.0 xlwt: 1.1.2 xlsxwriter: 0.9.2 lxml: 3.6.0 bs4: 4.5.1 html5lib: 0.999999999 sqlalchemy: 1.0.13 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: None
Issue Analytics
- State:
- Created 6 years ago
- Comments:9 (6 by maintainers)
Top Results From Across the Web
python pandas update overwrite=False > Invalid Type ...
I want to update NaN values in a column of one dataframe, when matching equal values of other column in another dataframe. The...
Read more >pandas.DataFrame.update — pandas 1.5.2 documentation
True : overwrite original DataFrame's values with values from other . False: only update values that are NA in the original DataFrame. filter_funccallable(1d- ......
Read more >What's New — pandas 0.23.1 documentation - PyData |
This is a minor bug-fix release in the 0.23.x series and includes some small regression fixes and bug fixes. We recommend that all...
Read more >Working with missing data — pandas 1.5.2 documentation
For datetime64[ns] types, NaT represents missing values. This is a pseudo-native sentinel value that can be represented by NumPy in a singular dtype ......
Read more >GSAS-II Developers Documentation - Read the Docs
The installation process will try to update to the current version, ... Object to track when tabs are pressed in the DataFrame window....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi, new contributor here so please correct me if I’m wrong!
This seems to be caused by situations where the Dataframe to be updated has a Datetime column with NaT values and the input Dataframe has either
Since in the situation of the second case the created column is full of only NA values, would it be reasonable to solve this by just adding a check to the function that if a column is full of only NA values, to skip the updating of that column?
I created a PR with an implementation of this as well as a couple new test cases including the one introduced above.
take