question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Inconsistency in Probability Values in TabularLIME

See original GitHub issue

Hi, I realize that there is an inconsistency among LIME probability and model probability. Is it about library or LIME is unable to approximate in this case (so local fidelity doesn’t hold).

Before starting, I have a binary classification problem and I want to explain the results of the model by using the coefficients found by LIME.

Steps that I followed:

  • I started with regular spark operations before model fitting. (e.g, stringindexing, vectorassembling)
  • Later I fitted the gbtClassifier
  • I used the fitted model to transform the data (to have probability and predictions from model)
  • I have used TabularLIME to explain the result of the model, I fitted and transformed the same data.
  • At the end I have “probability_Model” and “probability” (from LIME)

Here is the screenshot of these two columns which is not even similar to each other

Screenshot from 2021-05-12 11-14-54

Here is the example code that I used:

indexed = string_indexer.fit(df_cat).transform(df_cat) output = assembler.transform(indexed)model = gbt.fit(output)

pred = model.transform(output) pred = pred.withColumnRenamed("rawPrediction","rawPrediction_Model").withColumnRenamed("probability","probability_Model").withColumnRenamed("prediction","prediction_Model")

lime = TabularLIME(model = model, inputCol="features", outputCol="weights", predictionCol = "prediction") result = lime.fit(pred) r = result.transform(pred)

AB#1167728

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:13 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
memoryzcommented, Jun 20, 2021

The PR (#1077) was merged into the master branch.

Maven Coordinates com.microsoft.ml.spark:mmlspark_2.12:1.0.0-rc3-102-c84ab470-SNAPSHOT

Maven Resolver https://mmlspark.azureedge.net/maven

Documents and notebooks will come in next PR.

1reaction
memoryzcommented, May 31, 2021

Hey @demirbilek95,

Last question is about the new version of LIME, can you please share the maven coordinates when new version is released? Also rough estimation of release date would be amazing.

Definitely. I’ll update this thread with the PR when it’s ready, so you can find out the maven coordinates once it’s available. I cannot tell when the new version will be released or when the PR will be sent - my best estimates is 3-4 weeks.

If you want to try it out first, head over to my fork at https://github.com/memoryz/mmlspark/tree/jasowang/lime. The new LIME implementation is pretty much in shape. Checkout the unit test (com/microsoft/ml/spark/explainers/LIMESuite.scala) to get started. However: 1. the API is still unstable as we add in more features. 2. no pyspark support yet, only the JVM based version is available. 3. You’d have to build the code from sbt.

Thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Inconsistency in Probability Values in TabularLIME #1049
Hi, I realize that there is an inconsistency among LIME probability and model probability. Is it about library or LIME is unable to ......
Read more >
Dealing with the expert inconsistency in probability elicitation
Abstract╨In this paper, we present and discuss our experience in the task of probability elicitation from experts for the purpose of.
Read more >
Model-Value Inconsistency as a Signal for Epistemic Uncertainty
Tabular. Figure 5a shows the probability of reaching the novel state in gridworld when a self-inconsistency- seeking policy is followed (+σ-IVE) ...
Read more >
Probabilities inconsistencies in a Risk Management Framework
The values of the two columns are inconsistent. A probability of 95% does not correspond, for example to “more than once per year”,...
Read more >
Bayesian Approach for Inconsistent Information - PMC - NCBI
One way to describe a probability distribution is to describe the probability density f(x) for different values x. In principle, there are infinitely...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found