ValueError when saving datetime64[ns] value that contains the NaT value
See original GitHub issueI’m getting a ValueError when trying to save a pandas data frame using ‘pandavro’ which wraps ‘fastavro’. This error happens when a datetime64[ns] column contains a None/NaT value.
The following snippet reproduces the error:
import pandavro as pdx
import pandas as pd
import datetime as dt
df = pd.DataFrame({ 'col': [dt.datetime.utcnow(), None] })
print(df[df['col'].isna() == True])
print(df['col'].dtypes)
pdx.to_avro('temp.avro', df)
And raises the error:
ValueError Traceback (most recent call last) pandas/_libs/tslibs/nattype.pyx in pandas._libs.tslibs.nattype._make_error_func.f()
ValueError: NaTType does not support timestamp
Exception ignored in: ‘fastavro._write.prepare_timestamp_micros’ Traceback (most recent call last): File “pandas/_libs/tslibs/nattype.pyx”, line 59, in pandas._libs.tslibs.nattype._make_error_func.f ValueError: NaTType does not support timestamp
Is this behavior expected? Can I not save NaT values for datetime values in AVRO files?
Thanks
Issue Analytics
- State:
- Created 5 years ago
- Comments:8 (1 by maintainers)
Yes, but writing them as Null/None is my expected behavior, so this works for me.
I found an issue though, the replace changes the data type so pandavro fails to interpret the schema.
The correct work-around is:
Yes, that should work. However, when reading the file back again you will get
None
s, notNaN
s orNaT
s. If you need those when reading the file you’ll need to perform some logic to do a similar transformation.