question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cannot load PipelineModel

See original GitHub issue

Hi,

I am trying to port my ML pipeline so I can use LightGBM instead of the PySpark GBT. I have been able to design a Pipeline with a LightGBM as final estimator. Once trained, I save the PipelineModel object to disk succesfully.

Problem is, when I want to load the model again to evaluate it, the following error appears:

2019-07-11 10:44:03 INFO  DAGScheduler:54 - Job 66 finished: runJob at PythonRDD
.scala:152, took 0,709961 s
Traceback (most recent call last):
  File "C:/Users/Y0644483/Documents/Workspace/ninabrlong/bin/eval_model.py", lin
e 86, in <module>
    model = ml.PipelineModel.load(args["<path_model>"])
  File "C:\Users\Y0644483\AppData\Local\Continuum\miniconda3\envs\ninabrlong\lib
\site-packages\pyspark\python\lib\pyspark.zip\pyspark\ml\util.py", line 311, in
load
  File "C:\Users\Y0644483\AppData\Local\Continuum\miniconda3\envs\ninabrlong\lib
\site-packages\pyspark\python\lib\pyspark.zip\pyspark\ml\pipeline.py", line 244,
 in load
  File "C:\Users\Y0644483\AppData\Local\Continuum\miniconda3\envs\ninabrlong\lib
\site-packages\pyspark\python\lib\pyspark.zip\pyspark\ml\pipeline.py", line 378,
 in load
  File "C:\Users\Y0644483\AppData\Local\Continuum\miniconda3\envs\ninabrlong\lib
\site-packages\pyspark\python\lib\pyspark.zip\pyspark\ml\util.py", line 535, in
loadParamsInstance
  File "C:\Users\Y0644483\AppData\Local\Continuum\miniconda3\envs\ninabrlong\lib
\site-packages\pyspark\python\lib\pyspark.zip\pyspark\ml\util.py", line 478, in
__get_class
AttributeError: module 'com.microsoft.ml.spark' has no attribute 'LightGBMRegres
sionModel'

I could not find any reference to this error, and I do not have a clue on what it could be happening. Besides, I found some references in your docs about using saveNativeModel(), but do not know how that fits in a whole-pipeline-saving scenario.

I am using mmlspark 0.17 and pyspark 2.3.2 in standalone mode in my local development environment.

I looked into the saved model file and found the following structure:

{"class":"pyspark.ml.pipeline.PipelineModel","timestamp":1562834309828,"sparkVersion":"2.3.2","uid":"PipelineModel_423e9b309dc390188fb9","paramMap":{"stageUids":["CategoricalImputerModel_44e1b6199ae304e52301","Imputer_4dd2932c4e613d1a22a7","VectorAssembler_4b84b526562e9c57d94b","StandardScaler_435a845ad25d209ac500","StringIndexer_43adbca01f7d9b98b4a4","StringIndexer_44adb088b5df936619a3","StringIndexer_4f47ae3f303a64b83a33","StringIndexer_466ea94e036991e2b49c","StringIndexer_4e25a7fd976a2cd42a2d","StringIndexer_42a180d928833d6d08ba","StringIndexer_4544901887ec85bf8f93","StringIndexer_410c9fae53c67291e238","StringIndexer_48c5a6c27b7029672329","StringIndexer_4faabb0736b77c4e2e2d","StringIndexer_438795bd74a5ec9f9d8e","StringIndexer_416d809ec7e5c7a7ad58","StringIndexer_4c9b847fc6c2ed13b53a","VectorAssembler_45978399a1e581608699","LightGBMRegressionModel_4c6d84e3292c452f4ce5"],"language":"Python"}}

Any hint or help would be much appreciated.

Regards, Gus.

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:23 (6 by maintainers)

github_iconTop GitHub Comments

4reactions
tkelloggcommented, Nov 13, 2019

I’m experiencing the same issue. This code gets me past it for the time being.

from pyspark.ml.util import DefaultParamsReader
try:
    from unittest import mock
except ImportError:
    # For Python 2 you might have to pip install mock
    import mock

mangled_name = '_DefaultParamsReader__get_class'
prev_get_clazz = getattr(DefaultParamsReader, mangled_name)
def __get_class(clazz):
    try:
        return prev_get_clazz(clazz)
    except AttributeError as outer:
        try:
            alt_clazz = clazz.replace('com.microsoft.ml.spark', 'mmlspark')
            return prev_get_clazz(alt_clazz)
        except AttributeError:
            raise outer

# replace a private method inside spark to let mmlspark load it's own classes
with mock.patch.object(DefaultParamsReader, mangled_name, __get_class):
    # load the model
    model = CrossValidatorModel.read().load(reg_model_path)

Here’s another version that’s slightly more cleaned up & easier to reuse.

First, the reusable part:

from pyspark.ml.util import DefaultParamsReader
try:
    from unittest import mock
except ImportError:
    # For Python 2 you might have to pip install mock
    import mock

class MmlShim(object):
    mangled_name = '_DefaultParamsReader__get_class'
    prev_get_clazz = getattr(DefaultParamsReader, mangled_name)

    @classmethod
    def __get_class(cls, clazz):
        try:
            return cls.prev_get_clazz(clazz)
        except AttributeError as outer:
            try:
                alt_clazz = clazz.replace('com.microsoft.ml.spark', 'mmlspark')
                return cls.prev_get_clazz(alt_clazz)
            except AttributeError:
                raise outer

    def __enter__(self):
        self.mock = mock.patch.object(DefaultParamsReader, self.mangled_name, self.__get_class)
        self.mock.__enter__()
        return self

    def __exit__(self, *exc_info):
        self.mock.__exit__(*exc_info)

Then, to use it:

with MmlShim():
    model = CrossValidatorModel.read().load(reg_model_path)
3reactions
Keyeohcommented, Jul 12, 2019

Me again,

I have been able to remote debug my training script in order to stop exactly at the point after training and just before saving the model to disk. I wanted to check if the model was trained properly.

The model is ok, able to predict and I could also extract some metrics using an evaluator. At that point, and with a valid model in hand, I could reproduce the error:

model.write().overwrite().save("foomodel")
None
ml.PipelineModel.load("foomodel")
AttributeError: module 'com.microsoft.ml.spark' has no attribute 'LightGBMRegressionModel'

My guess is that something in the PipelineModel.load() method is not able to recognize the mmlspark bindings. Notice that I have executed those statements at the same stopped process.

Regards, Gus

Read more comments on GitHub >

github_iconTop Results From Across the Web

Cannot load PipelineModel · Issue #614 · microsoft/SynapseML
I have been able to design a Pipeline with a LightGBM as final estimator. Once trained, I save the PipelineModel object to disk...
Read more >
Cannot load pipeline model from pyspark - Stack Overflow
When I try to load model with following code I have this kind of error. pipelineModel = PipelineModel.load(pipeline_model_name) Traceback (most ...
Read more >
ML Persistence — Saving and Loading Models and Pipelines
They allow you to save and load models despite the languages — Scala, Java, Python or R — they have been saved in...
Read more >
mlflow.spark — MLflow 2.0.1 documentation
Models with this flavor can be loaded as PySpark PipelineModel objects in Python. ... Models with this flavor cannot be loaded back as...
Read more >
AnalysisException when loading a PipelineModel with Spark 3
I am upgrading my Spark version from 2.4.5 to 3.0.1 and I cannot load anymore the PipelineModel objects that use a "DecisionTreeClassifier" ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found