Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Inconsistency in Probability Values in TabularLIME

See original GitHub issue

Hi, I realize that there is an inconsistency among LIME probability and model probability. Is it about library or LIME is unable to approximate in this case (so local fidelity doesn’t hold).

Before starting, I have a binary classification problem and I want to explain the results of the model by using the coefficients found by LIME.

Steps that I followed:

I started with regular spark operations before model fitting. (e.g, stringindexing, vectorassembling)
Later I fitted the gbtClassifier
I used the fitted model to transform the data (to have probability and predictions from model)
I have used TabularLIME to explain the result of the model, I fitted and transformed the same data.
At the end I have “probability_Model” and “probability” (from LIME)

Here is the screenshot of these two columns which is not even similar to each other

Screenshot from 2021-05-12 11-14-54

Here is the example code that I used:

indexed = string_indexer.fit(df_cat).transform(df_cat) output = assembler.transform(indexed)model = gbt.fit(output)

pred = model.transform(output) pred = pred.withColumnRenamed("rawPrediction","rawPrediction_Model").withColumnRenamed("probability","probability_Model").withColumnRenamed("prediction","prediction_Model")

lime = TabularLIME(model = model, inputCol="features", outputCol="weights", predictionCol = "prediction") result = lime.fit(pred) r = result.transform(pred)

AB#1167728

Issue Analytics

State:
Created 2 years ago
Comments:13 (4 by maintainers)

Top GitHub Comments

1reaction

memoryzcommented, Jun 20, 2021

The PR (#1077) was merged into the master branch.

Maven Coordinates com.microsoft.ml.spark:mmlspark_2.12:1.0.0-rc3-102-c84ab470-SNAPSHOT

Maven Resolver https://mmlspark.azureedge.net/maven

Documents and notebooks will come in next PR.

1reaction

memoryzcommented, May 31, 2021

Hey @demirbilek95,

Last question is about the new version of LIME, can you please share the maven coordinates when new version is released? Also rough estimation of release date would be amazing.

Definitely. I’ll update this thread with the PR when it’s ready, so you can find out the maven coordinates once it’s available. I cannot tell when the new version will be released or when the PR will be sent - my best estimates is 3-4 weeks.

If you want to try it out first, head over to my fork at https://github.com/memoryz/mmlspark/tree/jasowang/lime. The new LIME implementation is pretty much in shape. Checkout the unit test (com/microsoft/ml/spark/explainers/LIMESuite.scala) to get started. However: 1. the API is still unstable as we add in more features. 2. no pyspark support yet, only the JVM based version is available. 3. You’d have to build the code from sbt.

Thanks!

Top Results From Across the Web

Inconsistency in Probability Values in TabularLIME #1049

Hi, I realize that there is an inconsistency among LIME probability and model probability. Is it about library or LIME is unable to ......

Dealing with the expert inconsistency in probability elicitation

Abstract╨In this paper, we present and discuss our experience in the task of probability elicitation from experts for the purpose of.

Model-Value Inconsistency as a Signal for Epistemic Uncertainty

Tabular. Figure 5a shows the probability of reaching the novel state in gridworld when a self-inconsistency- seeking policy is followed (+σ-IVE) ...

Probabilities inconsistencies in a Risk Management Framework

The values of the two columns are inconsistent. A probability of 95% does not correspond, for example to “more than once per year”,...

Bayesian Approach for Inconsistent Information - PMC - NCBI

One way to describe a probability distribution is to describe the probability density f(x) for different values x. In principle, there are infinitely...