Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

_pickle.PicklingError: Could not serialize object: ValueError: Cell is empty

See original GitHub issue

Expected Behavior

feast working with spark source

Current Behavior

not working…

Steps to reproduce

create a feature view that uses spark source
feast apply
call get_historical_features on FeatureStore, as show in the following code:
got the error

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/pyspark/serializers.py", line 437, in dumps
    return cloudpickle.dumps(obj, pickle_protocol)
  File "/usr/local/lib/python3.7/dist-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 102, in dumps
    cp.dump(obj)
  File "/usr/local/lib/python3.7/dist-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 563, in dump
    return Pickler.dump(self, obj)
  File "/usr/lib/python3.7/pickle.py", line 437, in dump
    self.save(obj)
  File "/usr/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.7/pickle.py", line 789, in save_tuple
    save(element)
  File "/usr/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/lib/python3.7/dist-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 745, in save_function
    *self._dynamic_function_reduce(obj), obj=obj
  File "/usr/local/lib/python3.7/dist-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 682, in _save_reduce_pickle5
    dictitems=dictitems, obj=obj
  File "/usr/lib/python3.7/pickle.py", line 638, in save_reduce
    save(args)
  File "/usr/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.7/pickle.py", line 789, in save_tuple
    save(element)
  File "/usr/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.7/pickle.py", line 774, in save_tuple
    save(element)
  File "/usr/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/lib/python3.7/dist-packages/dill/_dill.py", line 1226, in save_cell
    f = obj.cell_contents
ValueError: Cell is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/test_feast.py", line 23, in <module>
    hist_feats_job = fs.get_historical_features(entity_df=entity_df, features=feature_refs)
  File "/usr/local/lib/python3.7/dist-packages/feast/usage.py", line 269, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/feast/feature_store.py", line 920, in get_historical_features
    full_feature_names,
  File "/usr/local/lib/python3.7/dist-packages/feast/infra/passthrough_provider.py", line 196, in get_historical_features
    full_feature_names=full_feature_names,
  File "/usr/local/lib/python3.7/dist-packages/feast/usage.py", line 280, in wrapper
    raise exc.with_traceback(traceback)
  File "/usr/local/lib/python3.7/dist-packages/feast/usage.py", line 269, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/feast/infra/offline_stores/contrib/spark_offline_store/spark.py", line 145, in get_historical_features
    event_timestamp_col=event_timestamp_col,
  File "/usr/local/lib/python3.7/dist-packages/feast/infra/offline_stores/contrib/spark_offline_store/spark.py", line 366, in _upload_entity_df
    spark_session.createDataFrame(entity_df).createOrReplaceTempView(table_name)
  File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/session.py", line 674, in createDataFrame
    data, schema, samplingRatio, verifySchema)
  File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/pandas/conversion.py", line 340, in createDataFrame
    return self._create_dataframe(data, schema, samplingRatio, verifySchema)
  File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/session.py", line 701, in _create_dataframe
    jrdd = self._jvm.SerDeUtil.toJavaArray(rdd._to_java_object_rdd())
  File "/usr/local/lib/python3.7/dist-packages/pyspark/rdd.py", line 2620, in _to_java_object_rdd
    return self.ctx._jvm.SerDeUtil.pythonToJava(rdd._jrdd, True)
  File "/usr/local/lib/python3.7/dist-packages/pyspark/rdd.py", line 2952, in _jrdd
    self._jrdd_deserializer, profiler)
  File "/usr/local/lib/python3.7/dist-packages/pyspark/rdd.py", line 2830, in _wrap_function
    pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command)
  File "/usr/local/lib/python3.7/dist-packages/pyspark/rdd.py", line 2816, in _prepare_for_python_RDD
    pickled_command = ser.dumps(command)
  File "/usr/local/lib/python3.7/dist-packages/pyspark/serializers.py", line 447, in dumps
    raise pickle.PicklingError(msg)
_pickle.PicklingError: Could not serialize object: ValueError: Cell is empty

Specifications

Version: feast 0.20.1
Platform: ubuntu 18.04
Subsystem:

Possible Solution

Issue Analytics

State:
Created a year ago
Comments:6 (4 by maintainers)

Top GitHub Comments

1reaction

kfiringcommented, Apr 28, 2022

This looks like the same issue I saw in #2608. @kfiring what version of Python are you using? My issue was with Python 3.7, but upgrading to Python 3.8 solved it. i upgraded python to 3.8 also solved it, thanks a lot ~

1reaction

felixwang9817commented, Apr 28, 2022

This looks like the same issue I saw in #2608. @kfiring what version of Python are you using? My issue was with Python 3.7, but upgrading to Python 3.8 solved it.

Top Results From Across the Web

Pyspark: PicklingError: Could not serialize object

You are passing a pyspark dataframe, df_whitelist to a UDF , pyspark dataframes cannot be pickled. You are also doing computations on a ......

Pyspark error "Could not serialize object" - Clare S. Y. Huang

The issue is that, as self._mapping appears in the function addition , when applying addition_udf to the pyspark dataframe, the object self ( ......

Source code for pyspark.serializers - Apache Spark

By default, PySpark uses :class:`PickleSerializer` to serialize objects using ... if serialized is None: raise ValueError("serialized value should not be ...

[Pyspark.Pandas] PicklingError: Could not serialize object (this ...

Context: I am using pyspark.pandas in a Databricks jupyter notebook and doing some text manipulation within the dataframe.

Pyspark: Picklingerror: Could Not Serialize Object - ADocLib

By default PySpark uses :class:PickleSerializer to serialize objects using Python's ... PicklingError: Could not serialize object: ValueError: Cell is empty ...