question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DataFrame.update crashes with overwrite=False when NaT present

See original GitHub issue

Code Sample

df1 = DataFrame({'A': [1,None], 'B':[to_datetime('abc', errors='coerce'),to_datetime('2016-01-01')]})
df2 = DataFrame({'A': [2,3]})
df1.update(df2, overwrite=False)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-a766b5317aac> in <module>()
      1 df1 = DataFrame({'A': [1,None], 'B':[to_datetime('abc', errors='coerce'),to_datetime('2016-01-01')]})
      2 df2 = DataFrame({'A': [2,3]})
----> 3 df1.update(df2, overwrite=False)

~/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/frame.py in update(self, other, join, overwrite, filter_func, raise_conflict)
   3897
   3898             self[col] = expressions.where(mask, this, that,
-> 3899                                           raise_on_error=True)
   3900
   3901     # ----------------------------------------------------------------------

~/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/computation/expressions.py in where(cond, a, b, raise_on_error, use_numexpr)
    229
    230     if use_numexpr:
--> 231         return _where(cond, a, b, raise_on_error=raise_on_error)
    232     return _where_standard(cond, a, b, raise_on_error=raise_on_error)
    233

~/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/computation/expressions.py in _where_numexpr(cond, a, b, raise_on_error)
    152
    153     if result is None:
--> 154         result = _where_standard(cond, a, b, raise_on_error)
    155
    156     return result

~/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/computation/expressions.py in _where_standard(cond, a, b, raise_on_error)
    127 def _where_standard(cond, a, b, raise_on_error=True):
    128     return np.where(_values_from_object(cond), _values_from_object(a),
--> 129                     _values_from_object(b))
    130
    131

TypeError: invalid type promotion

Problem description

A similar problem as in issue #15593 which was fixed in pandas version 0.20.2, NaT values anywhere in the DataFrame still throws the following exception: TypeError: invalid type promotion

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Darwin OS-release: 16.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.20.2 pytest: 2.9.2 pip: 9.0.1 setuptools: 36.0.1 Cython: 0.24 numpy: 1.13.0 scipy: 0.17.1 xarray: None IPython: 6.1.0 sphinx: 1.4.1 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: 1.1.0 tables: 3.4.2 numexpr: 2.6.2 feather: 0.3.1 matplotlib: 1.5.1 openpyxl: 2.4.0 xlrd: 1.0.0 xlwt: 1.1.2 xlsxwriter: 0.9.2 lxml: 3.6.0 bs4: 4.5.1 html5lib: 0.999999999 sqlalchemy: 1.0.13 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: None

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:9 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
wxing11commented, Oct 30, 2022

Hi, new contributor here so please correct me if I’m wrong!

This seems to be caused by situations where the Dataframe to be updated has a Datetime column with NaT values and the input Dataframe has either

  1. A matching column by index but of a type that isn’t Datetime/Object. I assume an error here is expected.
  2. No matching column by index, so the call to reindex_like in the update function creates a column that isn’t of type Datetime/Object. (The example case above)

Since in the situation of the second case the created column is full of only NA values, would it be reasonable to solve this by just adding a check to the function that if a column is full of only NA values, to skip the updating of that column?

I created a PR with an implementation of this as well as a couple new test cases including the one introduced above.

1reaction
wxing11commented, Oct 26, 2022

take

Read more comments on GitHub >

github_iconTop Results From Across the Web

python pandas update overwrite=False > Invalid Type ...
I want to update NaN values in a column of one dataframe, when matching equal values of other column in another dataframe. The...
Read more >
pandas.DataFrame.update — pandas 1.5.2 documentation
True : overwrite original DataFrame's values with values from other . False: only update values that are NA in the original DataFrame. filter_funccallable(1d- ......
Read more >
What's New — pandas 0.23.1 documentation - PyData |
This is a minor bug-fix release in the 0.23.x series and includes some small regression fixes and bug fixes. We recommend that all...
Read more >
Working with missing data — pandas 1.5.2 documentation
For datetime64[ns] types, NaT represents missing values. This is a pseudo-native sentinel value that can be represented by NumPy in a singular dtype ......
Read more >
GSAS-II Developers Documentation - Read the Docs
The installation process will try to update to the current version, ... Object to track when tabs are pressed in the DataFrame window....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found