Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cannot load PipelineModel

See original GitHub issue

Hi,

I am trying to port my ML pipeline so I can use LightGBM instead of the PySpark GBT. I have been able to design a Pipeline with a LightGBM as final estimator. Once trained, I save the PipelineModel object to disk succesfully.

Problem is, when I want to load the model again to evaluate it, the following error appears:

2019-07-11 10:44:03 INFO  DAGScheduler:54 - Job 66 finished: runJob at PythonRDD
.scala:152, took 0,709961 s
Traceback (most recent call last):
  File "C:/Users/Y0644483/Documents/Workspace/ninabrlong/bin/eval_model.py", lin
e 86, in <module>
    model = ml.PipelineModel.load(args["<path_model>"])
  File "C:\Users\Y0644483\AppData\Local\Continuum\miniconda3\envs\ninabrlong\lib
\site-packages\pyspark\python\lib\pyspark.zip\pyspark\ml\util.py", line 311, in
load
  File "C:\Users\Y0644483\AppData\Local\Continuum\miniconda3\envs\ninabrlong\lib
\site-packages\pyspark\python\lib\pyspark.zip\pyspark\ml\pipeline.py", line 244,
 in load
  File "C:\Users\Y0644483\AppData\Local\Continuum\miniconda3\envs\ninabrlong\lib
\site-packages\pyspark\python\lib\pyspark.zip\pyspark\ml\pipeline.py", line 378,
 in load
  File "C:\Users\Y0644483\AppData\Local\Continuum\miniconda3\envs\ninabrlong\lib
\site-packages\pyspark\python\lib\pyspark.zip\pyspark\ml\util.py", line 535, in
loadParamsInstance
  File "C:\Users\Y0644483\AppData\Local\Continuum\miniconda3\envs\ninabrlong\lib
\site-packages\pyspark\python\lib\pyspark.zip\pyspark\ml\util.py", line 478, in
__get_class
AttributeError: module 'com.microsoft.ml.spark' has no attribute 'LightGBMRegres
sionModel'

I could not find any reference to this error, and I do not have a clue on what it could be happening. Besides, I found some references in your docs about using saveNativeModel(), but do not know how that fits in a whole-pipeline-saving scenario.

I am using mmlspark 0.17 and pyspark 2.3.2 in standalone mode in my local development environment.

I looked into the saved model file and found the following structure:

{"class":"pyspark.ml.pipeline.PipelineModel","timestamp":1562834309828,"sparkVersion":"2.3.2","uid":"PipelineModel_423e9b309dc390188fb9","paramMap":{"stageUids":["CategoricalImputerModel_44e1b6199ae304e52301","Imputer_4dd2932c4e613d1a22a7","VectorAssembler_4b84b526562e9c57d94b","StandardScaler_435a845ad25d209ac500","StringIndexer_43adbca01f7d9b98b4a4","StringIndexer_44adb088b5df936619a3","StringIndexer_4f47ae3f303a64b83a33","StringIndexer_466ea94e036991e2b49c","StringIndexer_4e25a7fd976a2cd42a2d","StringIndexer_42a180d928833d6d08ba","StringIndexer_4544901887ec85bf8f93","StringIndexer_410c9fae53c67291e238","StringIndexer_48c5a6c27b7029672329","StringIndexer_4faabb0736b77c4e2e2d","StringIndexer_438795bd74a5ec9f9d8e","StringIndexer_416d809ec7e5c7a7ad58","StringIndexer_4c9b847fc6c2ed13b53a","VectorAssembler_45978399a1e581608699","LightGBMRegressionModel_4c6d84e3292c452f4ce5"],"language":"Python"}}

Any hint or help would be much appreciated.

Regards, Gus.

Issue Analytics

State:
Created 4 years ago
Comments:23 (6 by maintainers)

Top GitHub Comments

4reactions

tkelloggcommented, Nov 13, 2019

I’m experiencing the same issue. This code gets me past it for the time being.

from pyspark.ml.util import DefaultParamsReader
try:
    from unittest import mock
except ImportError:
    # For Python 2 you might have to pip install mock
    import mock

mangled_name = '_DefaultParamsReader__get_class'
prev_get_clazz = getattr(DefaultParamsReader, mangled_name)
def __get_class(clazz):
    try:
        return prev_get_clazz(clazz)
    except AttributeError as outer:
        try:
            alt_clazz = clazz.replace('com.microsoft.ml.spark', 'mmlspark')
            return prev_get_clazz(alt_clazz)
        except AttributeError:
            raise outer

# replace a private method inside spark to let mmlspark load it's own classes
with mock.patch.object(DefaultParamsReader, mangled_name, __get_class):
    # load the model
    model = CrossValidatorModel.read().load(reg_model_path)

Here’s another version that’s slightly more cleaned up & easier to reuse.

First, the reusable part:

from pyspark.ml.util import DefaultParamsReader
try:
    from unittest import mock
except ImportError:
    # For Python 2 you might have to pip install mock
    import mock

class MmlShim(object):
    mangled_name = '_DefaultParamsReader__get_class'
    prev_get_clazz = getattr(DefaultParamsReader, mangled_name)

    @classmethod
    def __get_class(cls, clazz):
        try:
            return cls.prev_get_clazz(clazz)
        except AttributeError as outer:
            try:
                alt_clazz = clazz.replace('com.microsoft.ml.spark', 'mmlspark')
                return cls.prev_get_clazz(alt_clazz)
            except AttributeError:
                raise outer

    def __enter__(self):
        self.mock = mock.patch.object(DefaultParamsReader, self.mangled_name, self.__get_class)
        self.mock.__enter__()
        return self

    def __exit__(self, *exc_info):
        self.mock.__exit__(*exc_info)

Then, to use it:

with MmlShim():
    model = CrossValidatorModel.read().load(reg_model_path)

3reactions

Keyeohcommented, Jul 12, 2019

Me again,

I have been able to remote debug my training script in order to stop exactly at the point after training and just before saving the model to disk. I wanted to check if the model was trained properly.

The model is ok, able to predict and I could also extract some metrics using an evaluator. At that point, and with a valid model in hand, I could reproduce the error:

model.write().overwrite().save("foomodel")
None
ml.PipelineModel.load("foomodel")
AttributeError: module 'com.microsoft.ml.spark' has no attribute 'LightGBMRegressionModel'

My guess is that something in the PipelineModel.load() method is not able to recognize the mmlspark bindings. Notice that I have executed those statements at the same stopped process.

Regards, Gus