question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

how to use lime with xgboost?

See original GitHub issue

Hello!

I love lime and I used it so far to explain predictions of sklearn-based models. I am now switching to xgboost to use the xgboost.Booster(), but I see that lime is not working out of the box on this.

Here is where I get the error:

data = xgb.DMatrix(X_row, y_row)
exp = explainer.explain_instance(data, model.predict)

where X_row and y_row are the features and labels of the single instance I want to explain, and here is the error I get:

~/anaconda3/envs/testenv/lib/python3.7/site-packages/lime/lime_tabular.py in explain_instance(self, data_row, predict_fn, labels, top_labels, num_features, num_samples, distance_metric, model_regressor)
    309             explanations.
    310         """
--> 311         data, inverse = self.__data_inverse(data_row, num_samples)
    312         scaled_data = (data - self.scaler.mean_) / self.scaler.scale_
    313 

~/anaconda3/envs/testenv/lib/python3.7/site-packages/lime/lime_tabular.py in __data_inverse(self, data_row, num_samples)
    449                 binary, but categorical (as the original data)
    450         """
--> 451         data = np.zeros((num_samples, data_row.shape[0]))
    452         categorical_features = range(data_row.shape[0])
    453         if self.discretizer is None:

AttributeError: 'DMatrix' object has no attribute 'shape'

The main point here is that the xgboost model only takes as input an xgboost.DMatrix() object, while the lime explainer assumes the input to be a numpy array.

Is there any workaround for this? Am I missing anything?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
francescolosterzocommented, Jun 11, 2019

I thought there was something built-in already, but in the end I wrote a wrapper as you suggested. Here it is:

def wrapped_predict(data_x):
    '''
    wrap xgboost predict function in order to make it lime-friendly
    - model and feature_names are defined outside
    '''
    
    dummy_y = np.array([ 1 for _ in range(data_x.shape[0]) ])
    tmp_data = xgb.DMatrix(data_x, dummy_y, feature_names=feature_names)
    
    tmp_out = model.predict(tmp_data)
    
    # add the first column to make it like predict_proba
    out = np.zeros((data_x.shape[0], 2))
    out[:, 0] = 1-tmp_out
    out[:, 1] = tmp_out
    
    return out

Adding this somewhere into lime would make it out of the box compatible with xgboost 😃

Also, let me point out that xgboost.XGBRegressor and xgboost.XGBClassifier work out of the box with lime as you were suggesting, but they are not the xgboost.Booster() (i.e. what you get as output of xgboost.train()) which is, in my understanding, the whole point of using xgboost.

0reactions
SrikanthPusarlacommented, Nov 27, 2022

Hi francescolosterzo

I tried your wrapper function, I passed only one sample/row of ‘numpy.ndarray’ of (304,) for explanation but where as data_x shape is becoming (5000, 304) inside wrapper function, not sure why?

** please find code snippet & details below:** x_train is <class ‘numpy.matrix’> (29424, 304) – 304 features y_train is <class ‘numpy.ndarray’> (29424,) x_val is <class ‘numpy.matrix’> (2785, 304)

x_t = np.squeeze(np.asarray(x_train)) ## LimeTabularExplainer needs ndarray

explainer = lime.lime_tabular.LimeTabularExplainer(x_t, feature_names = features, class_names = [‘0’, ‘1’])

          def wrapped_predict(data_x):
              '''
              wrap xgboost predict function in order to make it lime-friendly
              - model and feature_names are defined outside
              '''
              print(data_x.shape)
              dummy_y = np.array([ 1 for _ in range(data_x.shape[0]) ])
              print(dummy_y)
              tmp_data = xgb.DMatrix(data_x, dummy_y, feature_names=features)
              print(tmp_data)
              
              tmp_out = _model.predict(tmp_data)
              print(tmp_out)
              
              # add the first column to make it like predict_proba
              out = np.zeros((data_x.shape[0], 2))
              print(out)
              out[:, 0] = 1-tmp_out
              out[:, 1] = tmp_out
              print(out)
              return out

x_v = np.squeeze(np.asarray(x_val)) # explainer needs ndarray

exp = explainer.explain_instance(x_v[0], wrapped_predict, num_features = 304)

it is giving output as

(5000, 304) [1 1 1 … 1 1 1] <xgboost.core.DMatrix object at 0x7f831ba15d60> [0.04686321 0.01429689 0.013503 … 0.01028677 0.01482881 0.01389816] [[0. 0.] [0. 0.] [0. 0.] … [0. 0.] [0. 0.] [0. 0.]] [[0.9531368 0.04686321] [0.98570311 0.01429689] [0.98649698 0.013503 ] … [0.98971325 0.01028677] [0.9851712 0.01482881] [0.98610187 0.01389816]]

please help me how to fix this.

Thanks, Srikanth

Read more comments on GitHub >

github_iconTop Results From Across the Web

Explaining XGB-Model with LIME - Kaggle
Explore and run machine learning code with Kaggle Notebooks | Using data from House ... SimpleImputer from xgboost import XGBRegressor data = pd.read_csv('....
Read more >
SHAP and LIME Python Libraries: Part 2 - Domino Data Lab
This blog post provides insights on how to use the SHAP and LIME Python libraries in practice and how to interpret their output, ......
Read more >
Explaining Machine Learning Classifiers with LIME
The idea of LIME is to give it a single datapoint, and the ML algorithm to use, and it will try to build...
Read more >
Building Trust in Machine Learning Models (using LIME in ...
The 3 models are a) Logistic Regression, b) Random Forests, c) XGBoost. 2.2 Steps for using Lime to make your model interpretable. LIME...
Read more >
LIME/SHAP understand machine learning models - Devoteam
These two methods operate at the output of a complex model, a black box with ... LIME has enabled us to explain the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found