question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ValueError when saving datetime64[ns] value that contains the NaT value

See original GitHub issue

I’m getting a ValueError when trying to save a pandas data frame using ‘pandavro’ which wraps ‘fastavro’. This error happens when a datetime64[ns] column contains a None/NaT value.

The following snippet reproduces the error:

import pandavro as pdx
import pandas as pd
import datetime as dt

df = pd.DataFrame({ 'col': [dt.datetime.utcnow(), None] })
print(df[df['col'].isna() == True])
print(df['col'].dtypes)
pdx.to_avro('temp.avro', df)

And raises the error:

ValueError Traceback (most recent call last) pandas/_libs/tslibs/nattype.pyx in pandas._libs.tslibs.nattype._make_error_func.f()

ValueError: NaTType does not support timestamp

Exception ignored in: ‘fastavro._write.prepare_timestamp_micros’ Traceback (most recent call last): File “pandas/_libs/tslibs/nattype.pyx”, line 59, in pandas._libs.tslibs.nattype._make_error_func.f ValueError: NaTType does not support timestamp

Is this behavior expected? Can I not save NaT values for datetime values in AVRO files?

Thanks

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:8 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
simonejsingcommented, Jan 30, 2019

Yes, but writing them as Null/None is my expected behavior, so this works for me.

I found an issue though, the replace changes the data type so pandavro fails to interpret the schema.

The correct work-around is:

schema = pdx.__schema_infer(df)
df = df.replace({np.nan:None})
pdx.to_avro('temp.avro', df, schema=schema)
0reactions
scottbeldencommented, Jan 30, 2019

Yes, that should work. However, when reading the file back again you will get Nones, not NaNs or NaTs. If you need those when reading the file you’ll need to perform some logic to do a similar transformation.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Numpy: Checking if a value is NaT - Stack Overflow
pandas can check for NaT with pandas.isnull : >>> import numpy as np >>> import pandas as pd >>> pd.isnull(np.datetime64('NaT')) True.
Read more >
Time series / date functionality — pandas 1.5.2 documentation
Sparse timeseries are the ones where you have a lot fewer points relative to the amount of time you are looking to resample....
Read more >
Several problems with dataframes containing datetime64 ...
I found a series of problems when I handle dataframes that contain datetime64 columns ... 465 if len(mgr.blocks) > 1 or mgr.blocks[0].values.dtype !=...
Read more >
10 Tricks for Converting Numbers and Strings to Datetime in ...
to_datetime() has an argument called errors that allows you to ignore the error or force an invalid value to NaT .
Read more >
Working with missing data | Pandas 中文
One has to be mindful that in Python (and NumPy), the nan's don't compare equal ... For datetime64[ns] types, NaT represents missing values....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found