_pickle.PicklingError: Could not serialize object: ValueError: Cell is empty
See original GitHub issueExpected Behavior
feast working with spark source
Current Behavior
not working…
Steps to reproduce
- create a feature view that uses spark source
- feast apply
- call get_historical_features on FeatureStore, as show in the following code:
- got the error
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/pyspark/serializers.py", line 437, in dumps
return cloudpickle.dumps(obj, pickle_protocol)
File "/usr/local/lib/python3.7/dist-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 102, in dumps
cp.dump(obj)
File "/usr/local/lib/python3.7/dist-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 563, in dump
return Pickler.dump(self, obj)
File "/usr/lib/python3.7/pickle.py", line 437, in dump
self.save(obj)
File "/usr/lib/python3.7/pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/lib/python3.7/pickle.py", line 789, in save_tuple
save(element)
File "/usr/lib/python3.7/pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/local/lib/python3.7/dist-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 745, in save_function
*self._dynamic_function_reduce(obj), obj=obj
File "/usr/local/lib/python3.7/dist-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 682, in _save_reduce_pickle5
dictitems=dictitems, obj=obj
File "/usr/lib/python3.7/pickle.py", line 638, in save_reduce
save(args)
File "/usr/lib/python3.7/pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/lib/python3.7/pickle.py", line 789, in save_tuple
save(element)
File "/usr/lib/python3.7/pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/lib/python3.7/pickle.py", line 774, in save_tuple
save(element)
File "/usr/lib/python3.7/pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/local/lib/python3.7/dist-packages/dill/_dill.py", line 1226, in save_cell
f = obj.cell_contents
ValueError: Cell is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/test_feast.py", line 23, in <module>
hist_feats_job = fs.get_historical_features(entity_df=entity_df, features=feature_refs)
File "/usr/local/lib/python3.7/dist-packages/feast/usage.py", line 269, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/feast/feature_store.py", line 920, in get_historical_features
full_feature_names,
File "/usr/local/lib/python3.7/dist-packages/feast/infra/passthrough_provider.py", line 196, in get_historical_features
full_feature_names=full_feature_names,
File "/usr/local/lib/python3.7/dist-packages/feast/usage.py", line 280, in wrapper
raise exc.with_traceback(traceback)
File "/usr/local/lib/python3.7/dist-packages/feast/usage.py", line 269, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/feast/infra/offline_stores/contrib/spark_offline_store/spark.py", line 145, in get_historical_features
event_timestamp_col=event_timestamp_col,
File "/usr/local/lib/python3.7/dist-packages/feast/infra/offline_stores/contrib/spark_offline_store/spark.py", line 366, in _upload_entity_df
spark_session.createDataFrame(entity_df).createOrReplaceTempView(table_name)
File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/session.py", line 674, in createDataFrame
data, schema, samplingRatio, verifySchema)
File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/pandas/conversion.py", line 340, in createDataFrame
return self._create_dataframe(data, schema, samplingRatio, verifySchema)
File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/session.py", line 701, in _create_dataframe
jrdd = self._jvm.SerDeUtil.toJavaArray(rdd._to_java_object_rdd())
File "/usr/local/lib/python3.7/dist-packages/pyspark/rdd.py", line 2620, in _to_java_object_rdd
return self.ctx._jvm.SerDeUtil.pythonToJava(rdd._jrdd, True)
File "/usr/local/lib/python3.7/dist-packages/pyspark/rdd.py", line 2952, in _jrdd
self._jrdd_deserializer, profiler)
File "/usr/local/lib/python3.7/dist-packages/pyspark/rdd.py", line 2830, in _wrap_function
pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command)
File "/usr/local/lib/python3.7/dist-packages/pyspark/rdd.py", line 2816, in _prepare_for_python_RDD
pickled_command = ser.dumps(command)
File "/usr/local/lib/python3.7/dist-packages/pyspark/serializers.py", line 447, in dumps
raise pickle.PicklingError(msg)
_pickle.PicklingError: Could not serialize object: ValueError: Cell is empty
Specifications
- Version: feast 0.20.1
- Platform: ubuntu 18.04
- Subsystem:
Possible Solution
Issue Analytics
- State:
- Created a year ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
Pyspark: PicklingError: Could not serialize object
You are passing a pyspark dataframe, df_whitelist to a UDF , pyspark dataframes cannot be pickled. You are also doing computations on a ......
Read more >Pyspark error "Could not serialize object" - Clare S. Y. Huang
The issue is that, as self._mapping appears in the function addition , when applying addition_udf to the pyspark dataframe, the object self ( ......
Read more >Source code for pyspark.serializers - Apache Spark
By default, PySpark uses :class:`PickleSerializer` to serialize objects using ... if serialized is None: raise ValueError("serialized value should not be ...
Read more >[Pyspark.Pandas] PicklingError: Could not serialize object (this ...
Context: I am using pyspark.pandas in a Databricks jupyter notebook and doing some text manipulation within the dataframe.
Read more >Pyspark: Picklingerror: Could Not Serialize Object - ADocLib
By default PySpark uses :class:PickleSerializer to serialize objects using Python's ... PicklingError: Could not serialize object: ValueError: Cell is empty ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
This looks like the same issue I saw in #2608. @kfiring what version of Python are you using? My issue was with Python 3.7, but upgrading to Python 3.8 solved it.