Inconsistency in Probability Values in TabularLIME
See original GitHub issueHi, I realize that there is an inconsistency among LIME probability and model probability. Is it about library or LIME is unable to approximate in this case (so local fidelity doesn’t hold).
Before starting, I have a binary classification problem and I want to explain the results of the model by using the coefficients found by LIME.
Steps that I followed:
- I started with regular spark operations before model fitting. (e.g, stringindexing, vectorassembling)
- Later I fitted the gbtClassifier
- I used the fitted model to transform the data (to have probability and predictions from model)
- I have used TabularLIME to explain the result of the model, I fitted and transformed the same data.
- At the end I have “probability_Model” and “probability” (from LIME)
Here is the screenshot of these two columns which is not even similar to each other
Here is the example code that I used:
indexed = string_indexer.fit(df_cat).transform(df_cat)
output = assembler.transform(indexed)model = gbt.fit(output)
pred = model.transform(output)
pred = pred.withColumnRenamed("rawPrediction","rawPrediction_Model").withColumnRenamed("probability","probability_Model").withColumnRenamed("prediction","prediction_Model")
lime = TabularLIME(model = model, inputCol="features", outputCol="weights", predictionCol = "prediction")
result = lime.fit(pred)
r = result.transform(pred)
Issue Analytics
- State:
- Created 2 years ago
- Comments:13 (4 by maintainers)
Top GitHub Comments
The PR (#1077) was merged into the master branch.
Maven Coordinates
com.microsoft.ml.spark:mmlspark_2.12:1.0.0-rc3-102-c84ab470-SNAPSHOT
Maven Resolver
https://mmlspark.azureedge.net/maven
Documents and notebooks will come in next PR.
Hey @demirbilek95,
Definitely. I’ll update this thread with the PR when it’s ready, so you can find out the maven coordinates once it’s available. I cannot tell when the new version will be released or when the PR will be sent - my best estimates is 3-4 weeks.
If you want to try it out first, head over to my fork at https://github.com/memoryz/mmlspark/tree/jasowang/lime. The new LIME implementation is pretty much in shape. Checkout the unit test (com/microsoft/ml/spark/explainers/LIMESuite.scala) to get started. However: 1. the API is still unstable as we add in more features. 2. no pyspark support yet, only the JVM based version is available. 3. You’d have to build the code from sbt.
Thanks!