question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Segmentation fault on SVMLIB

See original GitHub issue

Description:

In Scikit-learn version 0.23.2 calling the predict() method maliciously crafted model SVM can result in a segmentation fault. Such models can be introduced via pickle, json, or any other model permanence standard. The behaviour is triggered when one of the members of the _n_support array has a very large value, example 1000000 when calling libsvm.predict()

####Tested environment:

Ubuntu 9.3.0-17ubuntu1~20.04 Python 3.8.5 (default, Jul 28 2020, 12:59:40) [GCC 9.3.0] on linux Numpy version: ‘1.19.2’ Sklearn.version: ‘0.23.2’

Steps/Code to Reproduce


from sklearn import svm
from sklearn import datasets


if __name__ == '__main__':
    X,y = datasets.load_iris(return_X_y=True)
    clf = svm.SVC()
    clf.fit(X, y)
    clf._n_support[0] = 1000000
    y_pred = clf.predict(X)

Expected Results

not to fail

Actual Results

Segmentation fault, this is a debugger trace

Thread 1 "python3-dbg" received signal SIGSEGV, Segmentation fault. 0x00007fffd7174df7 in svm_predict_values () from /home/pablo/.local/lib/python3.8/site-packages/sklearn/svm/_libsvm.cpython-38-x86_64-linux-gnu.so (gdb) bt #0 0x00007fffd7174df7 in svm_predict_values () from /home/pablo/.local/lib/python3.8/site-packages/sklearn/svm/_libsvm.cpython-38-x86_64-linux-gnu.so #1 0x00007fffd717504f in svm_predict () from /home/pablo/.local/lib/python3.8/site-packages/sklearn/svm/_libsvm.cpython-38-x86_64-linux-gnu.so #2 0x00007fffd7163e57 in copy_predict () from /home/pablo/.local/lib/python3.8/site-packages/sklearn/svm/_libsvm.cpython-38-x86_64-linux-gnu.so #3 0x00007fffd716bb3a in ?? () from /home/pablo/.local/lib/python3.8/site-packages/sklearn/svm/_libsvm.cpython-38-x86_64-linux-gnu.so #4 0x00007fffd716d47d in ?? () from /home/pablo/.local/lib/python3.8/site-packages/sklearn/svm/_libsvm.cpython-38-x86_64-linux-gnu.so #5 0x000000000043663e in cfunction_call_varargs (func=0x7fffd76e9dd0, args=0x7fffd40fc1d0, kwargs=0x7fffd40fbdd0) at ../Objects/call.c:742 #6 0x000000000043920c in PyCFunction_Call (func=<optimized out>, args=<optimized out>, kwargs=<optimized out>) at ../Objects/call.c:772 #7 0x000000000043708b in _PyObject_MakeTpCall (callable=callable@entry=0x7fffd76e9dd0, args=args@entry=0x18433a0, nargs=<optimized out>, keywords=keywords@entry=0x7fffd7821de0) at ../Objects/call.c:159 #8 0x00000000004ebe5b in _PyObject_Vectorcall (kwnames=0x7fffd7821de0, nargsf=9223372036854775816, args=0x18433a0, callable=0x7fffd76e9dd0) at ../Include/cpython/abstract.h:125 #9 call_function (kwnames=0x7fffd7821de0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x954500) at ../Python/ceval.c:4963 #10 _PyEval_EvalFrameDefault (f=0x1843210, throwflag=<optimized out>) at ../Python/ceval.c:3515 #11 0x00000000004df2ef in PyEval_EvalFrameEx (f=

Versions

System: python: 3.8.5 (default, Jul 28 2020, 12:59:40) [GCC 9.3.0] executable: /usr/bin/python3 machine: Linux-5.4.0-53-generic-x86_64-with-glibc2.29

Python dependencies: pip: 20.0.2 setuptools: 45.2.0 sklearn: 0.23.2 numpy: 1.19.2 scipy: 1.5.2 Cython: None pandas: 1.1.2 matplotlib: 3.3.2 joblib: 0.16.0 threadpoolctl: 2.1.0

Built with OpenMP: True

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:11 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
rthcommented, Oct 14, 2021

This high CVE score is tripping up people trying to use scikit learn in regulated and secure environments.

Hmm, yes https://nvd.nist.gov/vuln/detail/CVE-2020-28975 and those are really used for enterprise deployment. We could do a bit more sanity checks in https://github.com/scikit-learn/scikit-learn/blob/b43f057b22dead2e98669b8f7931eaf86d24b1d1/sklearn/svm/_libsvm.pyx#L279 to make that CVE go await, even if it would be difficult to be fully fail proof. A PR would be welcome.

1reaction
bkreidercommented, Oct 14, 2021

This is where it’s out of scope here: we can’t guard against everything.

Isn’t the model loader improperly validating the model and then segfaulting? Is pickle the only way to load a model?

The proof of concept exploit code is just using the private attribute to shorten the example. Perhaps it should have used a comment there describing “loading a model with a modified n value”?

This high CVE score is tripping up people trying to use scikit learn in regulated and secure environments.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Segfault with large feature vector LibSVM - Stack Overflow
I am running LibSVM on an Android application (NDK). I have implemented similar code on a Mac application that works well for all...
Read more >
Fix segmentation fault when SVM is trained in sequenece ...
An error occurred while retrieving approval data for this merge request. Fix segmentation fault when SVM is trained in sequenece. Fixes issue ...
Read more >
[Solved]-LibSVM fails (SEGFAULT) on large feature vector-C++
Coding example for the question LibSVM fails (SEGFAULT) on large feature ... It appears you're getting the error from LIBSVM's dot product operation...
Read more >
LIBSVM FAQ
Why occasionally the program (including MATLAB or other interfaces) crashes and gives a segmentation fault? How to build a dynamic library ...
Read more >
LIBSVM crashes after sucessfully running a few times
Hi, I am using LIBSVM.jl to train a SVM classification model. Apparently the model in my code ... It shows a segmentation fault...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found