question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MLeap should support skipping rows with StringIndexer

See original GitHub issue

I am getting a ‘key not found’ error when serving using the docker 0.7 mleap serving image (would use newer version but the libraries needed for serializing are only available on Maven up to 0.7). Does MLeap ignore the setHandleInvalid("skip")?

Using Scala 2.11 and Spark 2.1.

Part of my pipeline:

...
 stages = stages :+ new StringIndexer()
                    .setInputCol(c)
                    .setOutputCol(c+"Index")
                    .setHandleInvalid("skip")`

...

 stages = stages :+ new OneHotEncoder() // Using the MLeap OneHotEncoder
                    .setInputCol(c)
                    .setOutputCol(c+"Vec")

Error when using string that isn’t part of the training set.

[ERROR] [07/14/2017 15:52:39.358] [MleapServing-akka.actor.default-dispatcher-15] [akka.actor.ActorSystemImpl(MleapServing)] Error during processing of request: 'key not found: blob.com'. Completing with 500 Internal Server Error response.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:16 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
rnandirecommented, Dec 11, 2017

@smaxwellstewart I am looking forward for this fix. Can you let us know likely release date for 0.9.0 ?

1reaction
hollinwilkinscommented, Oct 11, 2017

@smaxwellstewart https://github.com/combust/mleap/pull/289

Should be skipping rows shortly 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

StringIndexer — PySpark 3.3.1 documentation - Apache Spark
A label indexer that maps a string column of labels to an ML column of label indices. If the input column is numeric,...
Read more >
org.apache.spark.ml.feature.StringIndexer Scala Example
The following examples show how to use org.apache.spark.ml.feature.StringIndexer. You can vote up the ones you like or vote down the ones you don't...
Read more >
MLeap: Deploy Spark ML Pipelines to Production API Servers
During our presentation, we will show you how to deploy any Spark ML Pipeline, as well as custom transformers, that are trained using...
Read more >
Spark, ML, StringIndexer: handling unseen labels
I'm afraid that setHandleInvalid("skip") will cause the whole row to be discarded, when you really just want to ignore the previously unseen ...
Read more >
Learning Spark, Second Edition - Databricks
loads, we will not cover all of the languages that Spark supports. ... with Spark's history and the high-level concepts, you can skip...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found