Cannot load PipelineModel
See original GitHub issueHi,
I am trying to port my ML pipeline so I can use LightGBM instead of the PySpark GBT. I have been able to design a Pipeline with a LightGBM as final estimator. Once trained, I save the PipelineModel object to disk succesfully.
Problem is, when I want to load the model again to evaluate it, the following error appears:
2019-07-11 10:44:03 INFO DAGScheduler:54 - Job 66 finished: runJob at PythonRDD
.scala:152, took 0,709961 s
Traceback (most recent call last):
File "C:/Users/Y0644483/Documents/Workspace/ninabrlong/bin/eval_model.py", lin
e 86, in <module>
model = ml.PipelineModel.load(args["<path_model>"])
File "C:\Users\Y0644483\AppData\Local\Continuum\miniconda3\envs\ninabrlong\lib
\site-packages\pyspark\python\lib\pyspark.zip\pyspark\ml\util.py", line 311, in
load
File "C:\Users\Y0644483\AppData\Local\Continuum\miniconda3\envs\ninabrlong\lib
\site-packages\pyspark\python\lib\pyspark.zip\pyspark\ml\pipeline.py", line 244,
in load
File "C:\Users\Y0644483\AppData\Local\Continuum\miniconda3\envs\ninabrlong\lib
\site-packages\pyspark\python\lib\pyspark.zip\pyspark\ml\pipeline.py", line 378,
in load
File "C:\Users\Y0644483\AppData\Local\Continuum\miniconda3\envs\ninabrlong\lib
\site-packages\pyspark\python\lib\pyspark.zip\pyspark\ml\util.py", line 535, in
loadParamsInstance
File "C:\Users\Y0644483\AppData\Local\Continuum\miniconda3\envs\ninabrlong\lib
\site-packages\pyspark\python\lib\pyspark.zip\pyspark\ml\util.py", line 478, in
__get_class
AttributeError: module 'com.microsoft.ml.spark' has no attribute 'LightGBMRegres
sionModel'
I could not find any reference to this error, and I do not have a clue on what it could be happening. Besides, I found some references in your docs about using saveNativeModel(), but do not know how that fits in a whole-pipeline-saving scenario.
I am using mmlspark 0.17 and pyspark 2.3.2 in standalone mode in my local development environment.
I looked into the saved model file and found the following structure:
{"class":"pyspark.ml.pipeline.PipelineModel","timestamp":1562834309828,"sparkVersion":"2.3.2","uid":"PipelineModel_423e9b309dc390188fb9","paramMap":{"stageUids":["CategoricalImputerModel_44e1b6199ae304e52301","Imputer_4dd2932c4e613d1a22a7","VectorAssembler_4b84b526562e9c57d94b","StandardScaler_435a845ad25d209ac500","StringIndexer_43adbca01f7d9b98b4a4","StringIndexer_44adb088b5df936619a3","StringIndexer_4f47ae3f303a64b83a33","StringIndexer_466ea94e036991e2b49c","StringIndexer_4e25a7fd976a2cd42a2d","StringIndexer_42a180d928833d6d08ba","StringIndexer_4544901887ec85bf8f93","StringIndexer_410c9fae53c67291e238","StringIndexer_48c5a6c27b7029672329","StringIndexer_4faabb0736b77c4e2e2d","StringIndexer_438795bd74a5ec9f9d8e","StringIndexer_416d809ec7e5c7a7ad58","StringIndexer_4c9b847fc6c2ed13b53a","VectorAssembler_45978399a1e581608699","LightGBMRegressionModel_4c6d84e3292c452f4ce5"],"language":"Python"}}
Any hint or help would be much appreciated.
Regards, Gus.
Issue Analytics
- State:
- Created 4 years ago
- Comments:23 (6 by maintainers)
Top GitHub Comments
I’m experiencing the same issue. This code gets me past it for the time being.
Here’s another version that’s slightly more cleaned up & easier to reuse.
First, the reusable part:
Then, to use it:
Me again,
I have been able to remote debug my training script in order to stop exactly at the point after training and just before saving the model to disk. I wanted to check if the model was trained properly.
The model is ok, able to predict and I could also extract some metrics using an evaluator. At that point, and with a valid model in hand, I could reproduce the error:
My guess is that something in the PipelineModel.load() method is not able to recognize the mmlspark bindings. Notice that I have executed those statements at the same stopped process.
Regards, Gus