question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

_pickle.PicklingError: Could not serialize object: ValueError: Cell is empty

See original GitHub issue

Expected Behavior

feast working with spark source

Current Behavior

not working…

Steps to reproduce

  1. create a feature view that uses spark source
  2. feast apply
  3. call get_historical_features on FeatureStore, as show in the following code: image
  4. got the error
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/pyspark/serializers.py", line 437, in dumps
    return cloudpickle.dumps(obj, pickle_protocol)
  File "/usr/local/lib/python3.7/dist-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 102, in dumps
    cp.dump(obj)
  File "/usr/local/lib/python3.7/dist-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 563, in dump
    return Pickler.dump(self, obj)
  File "/usr/lib/python3.7/pickle.py", line 437, in dump
    self.save(obj)
  File "/usr/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.7/pickle.py", line 789, in save_tuple
    save(element)
  File "/usr/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/lib/python3.7/dist-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 745, in save_function
    *self._dynamic_function_reduce(obj), obj=obj
  File "/usr/local/lib/python3.7/dist-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 682, in _save_reduce_pickle5
    dictitems=dictitems, obj=obj
  File "/usr/lib/python3.7/pickle.py", line 638, in save_reduce
    save(args)
  File "/usr/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.7/pickle.py", line 789, in save_tuple
    save(element)
  File "/usr/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.7/pickle.py", line 774, in save_tuple
    save(element)
  File "/usr/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/lib/python3.7/dist-packages/dill/_dill.py", line 1226, in save_cell
    f = obj.cell_contents
ValueError: Cell is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/test_feast.py", line 23, in <module>
    hist_feats_job = fs.get_historical_features(entity_df=entity_df, features=feature_refs)
  File "/usr/local/lib/python3.7/dist-packages/feast/usage.py", line 269, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/feast/feature_store.py", line 920, in get_historical_features
    full_feature_names,
  File "/usr/local/lib/python3.7/dist-packages/feast/infra/passthrough_provider.py", line 196, in get_historical_features
    full_feature_names=full_feature_names,
  File "/usr/local/lib/python3.7/dist-packages/feast/usage.py", line 280, in wrapper
    raise exc.with_traceback(traceback)
  File "/usr/local/lib/python3.7/dist-packages/feast/usage.py", line 269, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/feast/infra/offline_stores/contrib/spark_offline_store/spark.py", line 145, in get_historical_features
    event_timestamp_col=event_timestamp_col,
  File "/usr/local/lib/python3.7/dist-packages/feast/infra/offline_stores/contrib/spark_offline_store/spark.py", line 366, in _upload_entity_df
    spark_session.createDataFrame(entity_df).createOrReplaceTempView(table_name)
  File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/session.py", line 674, in createDataFrame
    data, schema, samplingRatio, verifySchema)
  File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/pandas/conversion.py", line 340, in createDataFrame
    return self._create_dataframe(data, schema, samplingRatio, verifySchema)
  File "/usr/local/lib/python3.7/dist-packages/pyspark/sql/session.py", line 701, in _create_dataframe
    jrdd = self._jvm.SerDeUtil.toJavaArray(rdd._to_java_object_rdd())
  File "/usr/local/lib/python3.7/dist-packages/pyspark/rdd.py", line 2620, in _to_java_object_rdd
    return self.ctx._jvm.SerDeUtil.pythonToJava(rdd._jrdd, True)
  File "/usr/local/lib/python3.7/dist-packages/pyspark/rdd.py", line 2952, in _jrdd
    self._jrdd_deserializer, profiler)
  File "/usr/local/lib/python3.7/dist-packages/pyspark/rdd.py", line 2830, in _wrap_function
    pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command)
  File "/usr/local/lib/python3.7/dist-packages/pyspark/rdd.py", line 2816, in _prepare_for_python_RDD
    pickled_command = ser.dumps(command)
  File "/usr/local/lib/python3.7/dist-packages/pyspark/serializers.py", line 447, in dumps
    raise pickle.PicklingError(msg)
_pickle.PicklingError: Could not serialize object: ValueError: Cell is empty

Specifications

  • Version: feast 0.20.1
  • Platform: ubuntu 18.04
  • Subsystem:

Possible Solution

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
kfiringcommented, Apr 28, 2022

This looks like the same issue I saw in #2608. @kfiring what version of Python are you using? My issue was with Python 3.7, but upgrading to Python 3.8 solved it. i upgraded python to 3.8 also solved it, thanks a lot ~

1reaction
felixwang9817commented, Apr 28, 2022

This looks like the same issue I saw in #2608. @kfiring what version of Python are you using? My issue was with Python 3.7, but upgrading to Python 3.8 solved it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pyspark: PicklingError: Could not serialize object
You are passing a pyspark dataframe, df_whitelist to a UDF , pyspark dataframes cannot be pickled. You are also doing computations on a ......
Read more >
Pyspark error "Could not serialize object" - Clare S. Y. Huang
The issue is that, as self._mapping appears in the function addition , when applying addition_udf to the pyspark dataframe, the object self ( ......
Read more >
Source code for pyspark.serializers - Apache Spark
By default, PySpark uses :class:`PickleSerializer` to serialize objects using ... if serialized is None: raise ValueError("serialized value should not be ...
Read more >
[Pyspark.Pandas] PicklingError: Could not serialize object (this ...
Context: I am using pyspark.pandas in a Databricks jupyter notebook and doing some text manipulation within the dataframe.
Read more >
Pyspark: Picklingerror: Could Not Serialize Object - ADocLib
By default PySpark uses :class:PickleSerializer to serialize objects using Python's ... PicklingError: Could not serialize object: ValueError: Cell is empty ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found