Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

how to use lime with xgboost?

See original GitHub issue

Hello!

I love lime and I used it so far to explain predictions of sklearn-based models. I am now switching to xgboost to use the xgboost.Booster(), but I see that lime is not working out of the box on this.

Here is where I get the error:

data = xgb.DMatrix(X_row, y_row)
exp = explainer.explain_instance(data, model.predict)

where X_row and y_row are the features and labels of the single instance I want to explain, and here is the error I get:

~/anaconda3/envs/testenv/lib/python3.7/site-packages/lime/lime_tabular.py in explain_instance(self, data_row, predict_fn, labels, top_labels, num_features, num_samples, distance_metric, model_regressor)
    309             explanations.
    310         """
--> 311         data, inverse = self.__data_inverse(data_row, num_samples)
    312         scaled_data = (data - self.scaler.mean_) / self.scaler.scale_
    313 

~/anaconda3/envs/testenv/lib/python3.7/site-packages/lime/lime_tabular.py in __data_inverse(self, data_row, num_samples)
    449                 binary, but categorical (as the original data)
    450         """
--> 451         data = np.zeros((num_samples, data_row.shape[0]))
    452         categorical_features = range(data_row.shape[0])
    453         if self.discretizer is None:

AttributeError: 'DMatrix' object has no attribute 'shape'

The main point here is that the xgboost model only takes as input an xgboost.DMatrix() object, while the lime explainer assumes the input to be a numpy array.

Is there any workaround for this? Am I missing anything?

Issue Analytics

State:
Created 4 years ago
Comments:5 (1 by maintainers)

Top GitHub Comments

1reaction

francescolosterzocommented, Jun 11, 2019

I thought there was something built-in already, but in the end I wrote a wrapper as you suggested. Here it is:

def wrapped_predict(data_x):
    '''
    wrap xgboost predict function in order to make it lime-friendly
    - model and feature_names are defined outside
    '''
    
    dummy_y = np.array([ 1 for _ in range(data_x.shape[0]) ])
    tmp_data = xgb.DMatrix(data_x, dummy_y, feature_names=feature_names)
    
    tmp_out = model.predict(tmp_data)
    
    # add the first column to make it like predict_proba
    out = np.zeros((data_x.shape[0], 2))
    out[:, 0] = 1-tmp_out
    out[:, 1] = tmp_out
    
    return out

Adding this somewhere into lime would make it out of the box compatible with xgboost 😃

Also, let me point out that xgboost.XGBRegressor and xgboost.XGBClassifier work out of the box with lime as you were suggesting, but they are not the xgboost.Booster() (i.e. what you get as output of xgboost.train()) which is, in my understanding, the whole point of using xgboost.

0reactions

SrikanthPusarlacommented, Nov 27, 2022

Hi francescolosterzo

I tried your wrapper function, I passed only one sample/row of ‘numpy.ndarray’ of (304,) for explanation but where as data_x shape is becoming (5000, 304) inside wrapper function, not sure why?

** please find code snippet & details below:** x_train is <class ‘numpy.matrix’> (29424, 304) – 304 features y_train is <class ‘numpy.ndarray’> (29424,) x_val is <class ‘numpy.matrix’> (2785, 304)

x_t = np.squeeze(np.asarray(x_train)) ## LimeTabularExplainer needs ndarray

explainer = lime.lime_tabular.LimeTabularExplainer(x_t, feature_names = features, class_names = [‘0’, ‘1’])

          def wrapped_predict(data_x):
              '''
              wrap xgboost predict function in order to make it lime-friendly
              - model and feature_names are defined outside
              '''
              print(data_x.shape)
              dummy_y = np.array([ 1 for _ in range(data_x.shape[0]) ])
              print(dummy_y)
              tmp_data = xgb.DMatrix(data_x, dummy_y, feature_names=features)
              print(tmp_data)
              
              tmp_out = _model.predict(tmp_data)
              print(tmp_out)
              
              # add the first column to make it like predict_proba
              out = np.zeros((data_x.shape[0], 2))
              print(out)
              out[:, 0] = 1-tmp_out
              out[:, 1] = tmp_out
              print(out)
              return out

x_v = np.squeeze(np.asarray(x_val)) # explainer needs ndarray

exp = explainer.explain_instance(x_v[0], wrapped_predict, num_features = 304)

it is giving output as

(5000, 304) [1 1 1 … 1 1 1] <xgboost.core.DMatrix object at 0x7f831ba15d60> [0.04686321 0.01429689 0.013503 … 0.01028677 0.01482881 0.01389816] [[0. 0.] [0. 0.] [0. 0.] … [0. 0.] [0. 0.] [0. 0.]] [[0.9531368 0.04686321] [0.98570311 0.01429689] [0.98649698 0.013503 ] … [0.98971325 0.01028677] [0.9851712 0.01482881] [0.98610187 0.01389816]]

please help me how to fix this.

Thanks, Srikanth

Top Results From Across the Web

Explaining XGB-Model with LIME - Kaggle

Explore and run machine learning code with Kaggle Notebooks | Using data from House ... SimpleImputer from xgboost import XGBRegressor data = pd.read_csv('....

SHAP and LIME Python Libraries: Part 2 - Domino Data Lab

This blog post provides insights on how to use the SHAP and LIME Python libraries in practice and how to interpret their output, ......

Explaining Machine Learning Classifiers with LIME

The idea of LIME is to give it a single datapoint, and the ML algorithm to use, and it will try to build...

Building Trust in Machine Learning Models (using LIME in ...

The 3 models are a) Logistic Regression, b) Random Forests, c) XGBoost. 2.2 Steps for using Lime to make your model interpretable. LIME...

LIME/SHAP understand machine learning models - Devoteam

These two methods operate at the output of a complex model, a black box with ... LIME has enabled us to explain the...