Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unclear error is printed when wrong event_timestamp column type is used

See original GitHub issue

When running feast materialize-incremental 2022-01-01T00:00:00 on a parquet source that contains to a string based event_timestamp column, the following exception is printed.

Materializing 1 feature views to 2022-01-01 00:00:00-08:00 into the sqlite online store.

fake_data_fv from 2021-05-21 02:11:51-07:00 to 2022-01-01 00:00:00-08:00:
Traceback (most recent call last):
  File "/home/willem/.pyenv/versions/3.7.7/bin/feast", line 8, in <module>
    sys.exit(cli())
  File "/home/willem/.pyenv/versions/3.7.7/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/willem/.pyenv/versions/3.7.7/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/willem/.pyenv/versions/3.7.7/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/willem/.pyenv/versions/3.7.7/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/willem/.pyenv/versions/3.7.7/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/willem/.pyenv/versions/3.7.7/lib/python3.7/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/willem/.pyenv/versions/3.7.7/lib/python3.7/site-packages/feast/cli.py", line 270, in materialize_incremental_command
    end_date=datetime.fromisoformat(end_ts),
  File "/home/willem/.pyenv/versions/3.7.7/lib/python3.7/site-packages/feast/telemetry.py", line 151, in exception_logging_wrapper
    result = func(*args, **kwargs)
  File "/home/willem/.pyenv/versions/3.7.7/lib/python3.7/site-packages/feast/feature_store.py", line 379, in materialize_incremental
    tqdm_builder,
  File "/home/willem/.pyenv/versions/3.7.7/lib/python3.7/site-packages/feast/infra/local.py", line 193, in materialize_single_feature_view
    end_date=end_date,
  File "/home/willem/.pyenv/versions/3.7.7/lib/python3.7/site-packages/feast/infra/offline_stores/file.py", line 208, in pull_latest_from_table_or_query
    lambda x: x if x.tzinfo is not None else x.replace(tzinfo=pytz.utc)
  File "/home/willem/.pyenv/versions/3.7.7/lib/python3.7/site-packages/pandas/core/series.py", line 3848, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas/_libs/lib.pyx", line 2329, in pandas._libs.lib.map_infer
  File "/home/willem/.pyenv/versions/3.7.7/lib/python3.7/site-packages/feast/infra/offline_stores/file.py", line 208, in <lambda>
    lambda x: x if x.tzinfo is not None else x.replace(tzinfo=pytz.utc)
AttributeError: 'str' object has no attribute 'tzinfo'

Instead, we should validate types during materialize and print a clearer error message.

Issue Analytics

State:
Created 2 years ago
Reactions:8
Comments:6

Top GitHub Comments

5reactions

fcascommented, Apr 14, 2022

@sgvarsh the workaround that I found:

from pyspark.sql.functions import to_timestamp

conf = SparkConf().setMaster(SPARK_MASTER)
# FEAST does not work with INT96 (this is the default type using pyspark 
# to write parquet files containing timestamp fields, 
# another option is to use string based timestamps, but...)
# https://issues.apache.org/jira/browse/PARQUET-323
# https://stackoverflow.com/questions/56582539/how-to-save-spark-dataframe-to-parquet-without-using-int96-format-for-timestamp
# FEAST works with TIMESTAMP_MICROS (I did not try TIMESTAMP_MILLIS)
conf.set("spark.sql.parquet.outputTimestampType", "TIMESTAMP_MICROS")
spark_context = SparkContext(conf=conf)
sql_context = SQLContext(spark_context)
df = sql_context.read.csv(path)
df = df.withColumn("event_timestamp", to_timestamp(df.event_timestamp, "yyyy-MM-dd'T'HH:mm:ss.SSSSSSZ"))
## FEAST cannot read a directory with .parquet files
df.coalesce(1).write.mode("overwrite").parquet('output.parquet')

Inspecting the file output.parquet:

############ Column(event_timestamp) ############
name: event_timestamp
path: event_timestamp
max_definition_level: 1
max_repetition_level: 0
physical_type: INT64
logical_type: Timestamp(isAdjustedToUTC=true, timeUnit=microseconds, is_from_converted_type=false, force_set_converted_type=false)
converted_type (legacy): TIMESTAMP_MICROS

Reading the feature view:

training_df = fs.get_historical_features(
        entity_df=entity_df,
        features=[
            "feature_view:***",
            "feature_view:***",
            "feature_view:***",
        ],
).to_df()

print("----- Feature schema -----\n")
print(training_df.info())

print()
print("----- Example features -----\n")
print(training_df.head(8))

----- Feature schema -----

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 5 columns):
     Column                 Non-Null Count        Dtype              
---  ------                 --------------        -----              
 0   feast_id                     5 non-null      object             
 1   event_timestamp              0 non-null      datetime64[ns, UTC]
 2   ***                          5 non-null      object             
 3   ***                          5 non-null      object             
 4   ***                          5 non-null      object             
dtypes: `datetime64[ns, UTC](1)`,  `object(4)`
memory usage: 240.0+ bytes

----- Example features -----

   feast_id                              ...      ***
0  12f8cbcf-286a-44f6-a84d-e6d9a8fe902a  ...      ***
1  c47e2260-87eb-4748-b63f-cfda3c7fd258  ...      ***
2  7e835362-4ed8-41ed-b81d-7591b38c151d  ...      ***
3  24fa1717-5e92-4a57-bd19-0b3e851ea357  ...      ***
4  8ce9e852-3a4d-4e96-95dc-fa809481c08a  ...      ***

[5 rows x 5 columns]

1reaction

fcascommented, Apr 12, 2022

@woop do you know some workaround for this issue? It’s a stale issue, but the same problem existis even in the version 0.19.4 =/