Using MLeap with Pyspark getting a strange error
See original GitHub issueI’ve been looking at the various places that the MLeap/PySpark integration is documented and I’m finding contradictory information.
Referring to here: http://mleap-docs.combust.ml/getting-started/py-spark.html indicates that I should clone the repo down, setwd to the python folder, and then import mleap.pyspark
- however there is no folder named pyspark
in the mleap/python
folder.
I successfully was able to run
import sys
sys.path.append('/databricks/driver/mleap/python')
import mleap.spark
from mleap.spark.spark_support import SimpleSparkSerializer
where I’m importing mleap.spark .
But when I try to serialize the RandomForestRegressor model I have built I get this error:
rfModel.serializeToBundle("jar:file:/tmp/pyspark.example.zip")
TypeError Traceback (most recent call last)
<ipython-input-94-4a2b69639611> in <module>()
----> 1 pipelineModel.serializeToBundle("jar:file:/tmp/pyspark.example.zip")
/databricks/driver/mleap/python/mleap/spark/spark_support.py in serializeToBundle(self, path)
21
22 def serializeToBundle(self, path):
---> 23 serializer = SimpleSparkSerializer()
24 serializer.serializeToBundle(self, path)
25
/databricks/driver/mleap/python/mleap/spark/spark_support.py in __init__(self)
34 def __init__(self):
35 super(SimpleSparkSerializer, self).__init__()
---> 36 self._java_obj = _jvm().ml.combust.mleap.spark.SimpleSparkSerializer()
37
38 def serializeToBundle(self, transformer, path):
TypeError: 'JavaPackage' object is not callable
Can you correct the documentation on the “getting started with pyspark” page? And do you have thoughts on this error?
Issue Analytics
- State:
- Created 6 years ago
- Comments:27 (7 by maintainers)
Top Results From Across the Web
MLeap serializeToBundle error for Pyspark custom Transformer
I have a Pyspark custom Transformer that I am trying to serialize to an mLeap bundle object for later model scoring but I'm...
Read more >combust/mleap - Gitter
So I've built out a bunch of custom transformers in Scala and am successfully ... But when I try to serialize the model...
Read more >Solving 5 Mysterious Spark Errors | by yhoztak - Medium
It involves Spark, Livy, Jupyter notebook, luigi, EMR, backed with S3 in multi regions. It's powerful and great(This post explains how great it...
Read more >Python Exception Handling (With Examples) - Programiz
In the tutorial, we will learn about different approaches of exception handling in Python with the help of examples.
Read more >Excel incorrectly assumes that the year 1900 is a leap year
Because most users do not use dates before March 1, 1900, this problem is rare. NOTE: Microsoft Excel correctly handles all other leap...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I have the following setup
And I have 2 problems Problem 1:
SimpleSparkSerializer
spark_support.py
file @ line 33 and modifyingTransformer.__class__
into simpleTransformer
(eg:import works fine
Problem 2:
Am I missing something?
@rgeos I was also seeing the
resource/package$
error, with a setup similar to yours except 0.8.1 everything.You can bypass it by building a jar-with-dependencies off a scala example that does model serialization (like the MNIST example), then passing that jar with your pyspark job.
jar tf
confirmsresource/package$
etc. are in there, but I haven’t figured out what the ultimate dependency is.Failing to prefix the model path with
jar:file:
also results in an obscure error.