MLeap should support skipping rows with StringIndexer
See original GitHub issueIssue Description
I am getting a ‘key not found’ error when serving using the docker 0.7 mleap serving image (would use newer version but the libraries needed for serializing are only available on Maven up to 0.7). Does MLeap ignore the setHandleInvalid("skip")
?
Using Scala 2.11 and Spark 2.1.
Part of my pipeline:
...
stages = stages :+ new StringIndexer()
.setInputCol(c)
.setOutputCol(c+"Index")
.setHandleInvalid("skip")`
...
stages = stages :+ new OneHotEncoder() // Using the MLeap OneHotEncoder
.setInputCol(c)
.setOutputCol(c+"Vec")
Error when using string that isn’t part of the training set.
[ERROR] [07/14/2017 15:52:39.358] [MleapServing-akka.actor.default-dispatcher-15] [akka.actor.ActorSystemImpl(MleapServing)] Error during processing of request: 'key not found: blob.com'. Completing with 500 Internal Server Error response.
Issue Analytics
- State:
- Created 6 years ago
- Comments:16 (8 by maintainers)
Top Results From Across the Web
StringIndexer — PySpark 3.3.1 documentation - Apache Spark
A label indexer that maps a string column of labels to an ML column of label indices. If the input column is numeric,...
Read more >org.apache.spark.ml.feature.StringIndexer Scala Example
The following examples show how to use org.apache.spark.ml.feature.StringIndexer. You can vote up the ones you like or vote down the ones you don't...
Read more >MLeap: Deploy Spark ML Pipelines to Production API Servers
During our presentation, we will show you how to deploy any Spark ML Pipeline, as well as custom transformers, that are trained using...
Read more >Spark, ML, StringIndexer: handling unseen labels
I'm afraid that setHandleInvalid("skip") will cause the whole row to be discarded, when you really just want to ignore the previously unseen ...
Read more >Learning Spark, Second Edition - Databricks
loads, we will not cover all of the languages that Spark supports. ... with Spark's history and the high-level concepts, you can skip...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@smaxwellstewart I am looking forward for this fix. Can you let us know likely release date for 0.9.0 ?
@smaxwellstewart https://github.com/combust/mleap/pull/289
Should be skipping rows shortly 😃