question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: Summation of NaT in a DataFrame with axis=1 does not return NaT

See original GitHub issue

Code Sample, a copy-pastable example if possible

import pandas as pd
import io

dat = """s1;s2
1000;2000
3000;1500
500"""

df = pd.read_csv(io.StringIO(dat), sep=";")
df.index = df.index + 1
df = df.apply(lambda x: pd.to_timedelta(x, unit='ms'))

df['laptime'] = df.sum(axis=1, skipna=False)

print(df)

Problem description

This code displays

               s1              s2                       laptime
1        00:00:01        00:00:02               0 days 00:00:03
2        00:00:03 00:00:01.500000        0 days 00:00:04.500000
3 00:00:00.500000             NaT -106752 days +00:12:43.645223
In [81]: df['laptime'][3]
Out[81]: Timedelta('-106752 days +00:12:43.645223')

Expected Output

               s1              s2                       laptime
1        00:00:01        00:00:02               0 days 00:00:03
2        00:00:03 00:00:01.500000        0 days 00:00:04.500000
3 00:00:00.500000             NaT                           NaT
In [81]: df['laptime'][3]
Out[81]: NaT

That’s very strange because pd.to_timedelta(500, unit='ms') + pd.NaT returns NaT (which is expected) but it’s doesn’t work as expected when summing Series)

Output of pd.show_versions()

: pd.show_versions() /Users/scls/anaconda/lib/python3.6/site-packages/xarray/core/formatting.py:16: FutureWarning: The pandas.tslib module is deprecated and will be removed in a future version. from pandas.tslib import OutOfBoundsDatetime

INSTALLED VERSIONS

commit: None python: 3.6.1.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: fr_FR.UTF-8 LOCALE: fr_FR.UTF-8

pandas: 0.20.1 pytest: 3.1.2 pip: 9.0.1 setuptools: 27.2.0 Cython: 0.25.2 numpy: 1.12.1 scipy: 0.19.0 xarray: 0.9.5 IPython: 5.3.0 sphinx: 1.5.6 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: 1.2.1 tables: 3.3.0 numexpr: 2.6.2 feather: None matplotlib: 2.0.2 openpyxl: 2.4.7 xlrd: 1.0.0 xlwt: 1.2.0 xlsxwriter: 0.9.6 lxml: 3.7.3 bs4: 4.6.0 html5lib: 0.999 sqlalchemy: 1.1.9 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: 0.5.0

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:13 (12 by maintainers)

github_iconTop GitHub Comments

1reaction
mroeschkecommented, Nov 3, 2019

Ah good catch @ainsleyto. Yes looks like axis=1 case still needs a fix.

1reaction
ainsleytocommented, Nov 2, 2019

Looks to be fixed on master now. Could use a regression test.

In [20]: df['s2'].sum(skipna=False)
Out[20]: NaT

In [22]: pd.__version__
Out[22]: '0.26.0.dev0+519.gdf2e0813e'

‘’’

df[“s2”].sum(skipna=False) NaT

df[“s1”]+df[“s2”] 1 00:00:03 2 00:00:04.500000 3 NaT

df[[“s1”,“s2”]].sum(axis=1,skipna=False) 1 0 days 00:00:03 2 0 days 00:00:04.500000 3 -106752 days +00:12:43.645223

pd.version ‘0.26.0.dev0+734.g0de99558b’ ‘’’

sum of s2 gives NaT but sum(axis=1,skipna=False) still does not give NaT. I believe this still needs a fix?

Read more comments on GitHub >

github_iconTop Results From Across the Web

pandas concat generates nan values
I think there is problem with different index values, so where concat cannot align get NaN : aaa = pd.DataFrame([0,1,0,1,0,0], ...
Read more >
Working with missing data
For datetime64[ns] types, NaT represents missing values. This is a pseudo-native ... The sum of an empty or all-NA Series or column of...
Read more >
Cheat Sheet: The pandas DataFrame Object
DataFrame object: is a two-dimensional table of data ... Typically, the column index (df.columns) is a list of ... 2nd example returns NaT...
Read more >
Handling Missing Data in Pandas: NaN Values Explained
Note also that np.nan is not even to np.nan as np.nan basically means undefined. ... nat. nat means a missing date. Copy. df['time']...
Read more >
pandas check if cell is empty - Coffeenote
Like in case our dataframe has 3 rows and 4 columns it will return (3,4). ... Note: “CountA ()” function will not count...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found