question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Using MLeap with Pyspark getting a strange error

See original GitHub issue

I’ve been looking at the various places that the MLeap/PySpark integration is documented and I’m finding contradictory information.

Referring to here: http://mleap-docs.combust.ml/getting-started/py-spark.html indicates that I should clone the repo down, setwd to the python folder, and then import mleap.pyspark - however there is no folder named pyspark in the mleap/python folder.

I successfully was able to run

import sys
sys.path.append('/databricks/driver/mleap/python')

import mleap.spark
from mleap.spark.spark_support import SimpleSparkSerializer

where I’m importing mleap.spark .

But when I try to serialize the RandomForestRegressor model I have built I get this error:

rfModel.serializeToBundle("jar:file:/tmp/pyspark.example.zip")

TypeError                                 Traceback (most recent call last)
<ipython-input-94-4a2b69639611> in <module>()
----> 1 pipelineModel.serializeToBundle("jar:file:/tmp/pyspark.example.zip")

/databricks/driver/mleap/python/mleap/spark/spark_support.py in serializeToBundle(self, path)
     21 
     22 def serializeToBundle(self, path):
---> 23     serializer = SimpleSparkSerializer()
     24     serializer.serializeToBundle(self, path)
     25 

/databricks/driver/mleap/python/mleap/spark/spark_support.py in __init__(self)
     34     def __init__(self):
     35         super(SimpleSparkSerializer, self).__init__()
---> 36         self._java_obj = _jvm().ml.combust.mleap.spark.SimpleSparkSerializer()
     37 
     38     def serializeToBundle(self, transformer, path):

TypeError: 'JavaPackage' object is not callable

Can you correct the documentation on the “getting started with pyspark” page? And do you have thoughts on this error?

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:27 (7 by maintainers)

github_iconTop GitHub Comments

6reactions
rgeoscommented, Aug 15, 2017

I have the following setup

  • Python 3.5.2 Anaconda custom (64-bit)
  • Spark 2.2.0
  • added the following jar files inside $SPARK_HOME/jars
    • mleap-spark-base_2.11-0.7.0.jar
    • mleap-core_2.11-0.7.0.jar
    • mleap-runtime_2.11-0.7.0.jar
    • mleap-spark_2.11-0.7.0.jar
    • bundle-ml_2.11-0.7.0.jar
    • config-0.3.0.jar
    • scalapb-runtime_2.11-0.6.1.jar
    • mleap-tensor_2.11-0.7.0.jar
  • installed using pip mleap (0.7.0) - MLeap Python API

And I have 2 problems Problem 1:

  • cannot import SimpleSparkSerializer
TypeError                                 Traceback (most recent call last)
<ipython-input-2-f8a44b40601b> in <module>()
----> 1 from mleap.pyspark.spark_support import SimpleSparkSerializer

/Apps/anaconda3/lib/python3.5/site-packages/mleap/pyspark/spark_support.py in <module>()
     31 
     32 setattr(Transformer, 'serializeToBundle', serializeToBundle)
---> 33 setattr(Transformer.__class__, 'deserializeFromBundle', deserializeFromBundle)
     34 #setattr(Transformer, 'deserializeFromBundle', deserializeFromBundle)
     35 

TypeError: can't set attributes of built-in/extension type 'type'
  • after checking spark_support.py file @ line 33 and modifying Transformer.__class__ into simple Transformer (eg:
setattr(Transformer, 'deserializeFromBundle', deserializeFromBundle)

import works fine

Problem 2:

  • cannot serialize model.
Py4JJavaError: An error occurred while calling o126.serializeToBundle.
: java.lang.NoClassDefFoundError: resource/package$
	at ml.combust.mleap.spark.SimpleSparkSerializer.serializeToBundle(SimpleSparkSerializer.scala:20)

Am I missing something?

3reactions
ibnopcitcommented, Oct 17, 2017

@rgeos I was also seeing the resource/package$ error, with a setup similar to yours except 0.8.1 everything.

You can bypass it by building a jar-with-dependencies off a scala example that does model serialization (like the MNIST example), then passing that jar with your pyspark job. jar tf confirms resource/package$ etc. are in there, but I haven’t figured out what the ultimate dependency is.

Failing to prefix the model path with jar:file: also results in an obscure error.

Read more comments on GitHub >

github_iconTop Results From Across the Web

MLeap serializeToBundle error for Pyspark custom Transformer
I have a Pyspark custom Transformer that I am trying to serialize to an mLeap bundle object for later model scoring but I'm...
Read more >
combust/mleap - Gitter
So I've built out a bunch of custom transformers in Scala and am successfully ... But when I try to serialize the model...
Read more >
Solving 5 Mysterious Spark Errors | by yhoztak - Medium
It involves Spark, Livy, Jupyter notebook, luigi, EMR, backed with S3 in multi regions. It's powerful and great(This post explains how great it...
Read more >
Python Exception Handling (With Examples) - Programiz
In the tutorial, we will learn about different approaches of exception handling in Python with the help of examples.
Read more >
Excel incorrectly assumes that the year 1900 is a leap year
Because most users do not use dates before March 1, 1900, this problem is rare. NOTE: Microsoft Excel correctly handles all other leap...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found