how to use lime with xgboost?
See original GitHub issueHello!
I love lime and I used it so far to explain predictions of sklearn-based models. I am now switching to xgboost to use the xgboost.Booster()
, but I see that lime is not working out of the box on this.
Here is where I get the error:
data = xgb.DMatrix(X_row, y_row)
exp = explainer.explain_instance(data, model.predict)
where X_row
and y_row
are the features and labels of the single instance I want to explain, and here is the error I get:
~/anaconda3/envs/testenv/lib/python3.7/site-packages/lime/lime_tabular.py in explain_instance(self, data_row, predict_fn, labels, top_labels, num_features, num_samples, distance_metric, model_regressor)
309 explanations.
310 """
--> 311 data, inverse = self.__data_inverse(data_row, num_samples)
312 scaled_data = (data - self.scaler.mean_) / self.scaler.scale_
313
~/anaconda3/envs/testenv/lib/python3.7/site-packages/lime/lime_tabular.py in __data_inverse(self, data_row, num_samples)
449 binary, but categorical (as the original data)
450 """
--> 451 data = np.zeros((num_samples, data_row.shape[0]))
452 categorical_features = range(data_row.shape[0])
453 if self.discretizer is None:
AttributeError: 'DMatrix' object has no attribute 'shape'
The main point here is that the xgboost model only takes as input an xgboost.DMatrix()
object, while the lime explainer assumes the input to be a numpy array.
Is there any workaround for this? Am I missing anything?
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (1 by maintainers)
Top Results From Across the Web
Explaining XGB-Model with LIME - Kaggle
Explore and run machine learning code with Kaggle Notebooks | Using data from House ... SimpleImputer from xgboost import XGBRegressor data = pd.read_csv('....
Read more >SHAP and LIME Python Libraries: Part 2 - Domino Data Lab
This blog post provides insights on how to use the SHAP and LIME Python libraries in practice and how to interpret their output, ......
Read more >Explaining Machine Learning Classifiers with LIME
The idea of LIME is to give it a single datapoint, and the ML algorithm to use, and it will try to build...
Read more >Building Trust in Machine Learning Models (using LIME in ...
The 3 models are a) Logistic Regression, b) Random Forests, c) XGBoost. 2.2 Steps for using Lime to make your model interpretable. LIME...
Read more >LIME/SHAP understand machine learning models - Devoteam
These two methods operate at the output of a complex model, a black box with ... LIME has enabled us to explain the...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I thought there was something built-in already, but in the end I wrote a wrapper as you suggested. Here it is:
Adding this somewhere into
lime
would make it out of the box compatible withxgboost
😃Also, let me point out that
xgboost.XGBRegressor
andxgboost.XGBClassifier
work out of the box withlime
as you were suggesting, but they are not thexgboost.Booster()
(i.e. what you get as output ofxgboost.train()
) which is, in my understanding, the whole point of usingxgboost
.Hi francescolosterzo
I tried your wrapper function, I passed only one sample/row of ‘numpy.ndarray’ of (304,) for explanation but where as data_x shape is becoming (5000, 304) inside wrapper function, not sure why?
** please find code snippet & details below:** x_train is <class ‘numpy.matrix’> (29424, 304) – 304 features y_train is <class ‘numpy.ndarray’> (29424,) x_val is <class ‘numpy.matrix’> (2785, 304)
x_t = np.squeeze(np.asarray(x_train)) ## LimeTabularExplainer needs ndarray
explainer = lime.lime_tabular.LimeTabularExplainer(x_t, feature_names = features, class_names = [‘0’, ‘1’])
x_v = np.squeeze(np.asarray(x_val)) # explainer needs ndarray
exp = explainer.explain_instance(x_v[0], wrapped_predict, num_features = 304)
it is giving output as
(5000, 304) [1 1 1 … 1 1 1] <xgboost.core.DMatrix object at 0x7f831ba15d60> [0.04686321 0.01429689 0.013503 … 0.01028677 0.01482881 0.01389816] [[0. 0.] [0. 0.] [0. 0.] … [0. 0.] [0. 0.] [0. 0.]] [[0.9531368 0.04686321] [0.98570311 0.01429689] [0.98649698 0.013503 ] … [0.98971325 0.01028677] [0.9851712 0.01482881] [0.98610187 0.01389816]]
please help me how to fix this.
Thanks, Srikanth