question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Segmentation fault (core dumped) for shap_values

See original GitHub issue

Hi,

I’m trying to apply the TreeExplainer to get shap_values on XGBoost model on regression problem with large dataset. During the hyperparameter tuning, it failed due to segmentation fault at the explainer.shap_values() step in certain hyperparameter sets. I used fasttreeshap=0.1.1, xgboost=1.4.1 (also tested 1.6.0) and the machine came with CPU:“Intel Xeon E5-2640 v4 (20) @ 3.400GHz” and Memory:“128GB”. The sample code below is a toy script to reproduce the issue using the Superconductor dataset from example notebook:

# for debugging
import faulthandler
faulthandler.enable()

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
import xgboost as xgb
import fasttreeshap

print(f"XGBoost version: {xgb.__version__}")
print(f"fasttreeshap version: {fasttreeshap.__version__}")

# source of data: https://archive.ics.uci.edu/ml/datasets/superconductivty+data
data = pd.read_csv("FastTreeSHAP/data/superconductor_train.csv", engine = "python")
train, test = train_test_split(data, test_size = 0.5, random_state = 0)
label_train = train["critical_temp"]
label_test = test["critical_temp"]
train = train.iloc[:, :-1]
test = test.iloc[:, :-1]

print("train XGBoost model")
xgb_model = xgb.XGBRegressor(
    max_depth = 100, n_estimators = 200, learning_rate = 0.1, n_jobs = -1, alpha = 0.12, random_state = 0)
xgb_model.fit(train, label_train)

print("run TreeExplainer()")
shap_explainer = fasttreeshap.TreeExplainer(xgb_model)

print("run shap_values()")
shap_values = shap_explainer.shap_values(train)

The time report of program execution also showed that the “Maximum resident set size” was only about 32GB.

~$ /usr/bin/time -v python segfault.py 
XGBoost version: 1.4.1
fasttreeshap version: 0.1.1
train XGBoost model
run TreeExplainer()
run shap_values()
Fatal Python error: Segmentation fault

Thread 0x00007ff2c2793740 (most recent call first):
  File "~/.local/lib/python3.8/site-packages/fasttreeshap/explainers/_tree.py", line 459 in shap_values
  File "segfault.py", line 27 in <module>
Segmentation fault (core dumped)

Command terminated by signal 11
        Command being timed: "python segfault.py"
        User time (seconds): 333.65
        System time (seconds): 27.79
        Percent of CPU this job got: 797%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:45.30
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 33753096
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 8188488
        Voluntary context switches: 3048
        Involuntary context switches: 3089
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

In some case (and the example above), forcing TreeExplainer(algorithm="v1") did help, which means the issue could only happen to “v2” (or “auto” passing the _memory_check()). However, by chance the v1 would raise another check_additivity issue which remained unsolved in the original algorithm.

Alternatively, passing approximate=True to explainer.shap_values() would work but have the inconsistency concerns for the reproducibility of our studies…

In this case, could you help me to debug with this issue?

Thanks you so much!

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:8

github_iconTop GitHub Comments

2reactions
jlyang1990commented, May 4, 2022

Hi Shaun,

Thanks for pointing out this issue. I have checked the code, and it seems that the function _memory_check() doesn’t produce the correct result when the maximum tree depth is very large due to numerical errors. This leads to the out-of-memory issue when setting algorithm=v2 or auto (_memory_check() should figure out the out-of-memory issue for algorithm v2, and automatically switch to algorithm v1). I have fixed this issue by the latest commit https://github.com/linkedin/FastTreeSHAP/commit/fa8531502553ad5d3e3dfb9dce97a86acad41b1c.

Let me know if you still have the out-of-memory issue. Thanks!

1reaction
jlyang1990commented, May 13, 2022

Thanks Shaun so much for the detailed description of your experiment settings, and the table with very detailed quantitative results! Really happy to see that fasttreeshap has helped mitigate the numerical precision issues in your project. Let me know if there is anything else I can help with, and good luck with your project! 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

Inerpreter died due to Segmentation fault while calculating ...
I am using xgbregressor model and trying to interpret using shap kernal like ** explainer = shap.KernelExplainer(model.predict ...
Read more >
How to remedy "Segmentation fault (core dumped)" error ...
I have been dealing with this recently. Segmentation fault error stands for a failure of the parallelization of your training process on the...
Read more >
A brand new website interface for an even better experience!
Inerpreter died due to Segmentation fault while calculating shap values.
Read more >
Getting 'Segmentation fault (core dumped)' error message
It segfaults simply when I call the program. I type: "shapeit" and it immediately fails. If I try to put other parameters in...
Read more >
OXSTATGEN Archives - JISCMail
MVNcall v1.0 segmentation fault on sample file (2) ... values (IMPUTE2 argument) with the 1000G phase 3 reference. ... Segmentation fault (core dumped)...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found