Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

SVC and OneVsOneClassifier decision_function inconsistent on sub-sample

See original GitHub issue

Hi,

I’m seeing inconsistent numerical results with SVC’s decision_function. When estimated over an entire batch of samples ( (n_samples, n_features) matrix ) compared to analyzing sample-by-sample, the results are not the same. This is true for both the individual numerical values per sample and the overall distribution of the results.

The model is SVC with RBF kernel, for a 3-class classification:

SVC(C=1.0, gamma=0.007, class_weight = new_class_weight, probability = True, random_state = 30, 
decision_function_shape = 'ovr')

The models are loaded from file:

ML = joblib.load("model.pkl")

Option A, analyze a matrix:

distances = ML.decision_function(X)

Option B, analyze individual samples:

distances = numpy.zeros([X.shape[0], 3])
for i in range(X.shape[0]):     
    distances[i,:]` = ML.decision_function(X[i,:].reshape(1,-1))

Output for first two samples: Option A: sample 1: [ 0.90835588, -0.17305875, 2.26470288] sample 2: [ 1.10437313, -0.2371539 , 2.13278077]

Option B: sample 1: [ 0.82689247, -0.32689247, 2.5 ] sample 2: [ 1.22005359, -0.5 , 2.27994641]

I couldn’t find any indication for this behavior in the documentation.

Windows-10-10.0.15063-SP0 Python 3.5.2 |Anaconda 4.2.0 (64-bit)| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)] NumPy 1.12.1 SciPy 0.18.1 Scikit-Learn 0.18.1

Thanks!

Issue Analytics

State:
Created 6 years ago
Comments:17 (14 by maintainers)

Top GitHub Comments

2reactions

DanaAverbuchcommented, Jun 20, 2017

Here you go:

import numpy as np
from sklearn.datasets import load_iris
data = load_iris()
y=data.target
X = data.data
from sklearn.svm import SVC
from sklearn.multiclass import OneVsRestClassifier
clf = SVC(decision_function_shape='ovr').fit(X, y)
dfall = clf.decision_function(X)
same = list()
for i in range(X.shape[0]):
    dfone = clf.decision_function(X[i, :].reshape(1, -1))
    same.append(np.all(dfone == dfall[i, :]))

print(same)

1reaction

jnothmancommented, Jan 9, 2018

yes, I suppose it checks the main properties on a small sample. It assumes you can’t control the ovo output, while testing the helper directly would allow us to do so and hence be more rigorous.

Top Results From Across the Web

sklearn.multiclass.OneVsOneClassifier

Decision function for the OneVsOneClassifier. The decision values for the samples are computed by adding the normalized sum of pair-wise classification ...

scikit-learn 0.16.1 documentation

SVC fitted on sparse input now implements decision_function. ... OneVsOneClassifier. in case of ties at the per-class vote level by computing the correct ......

Release History — scikit-learn 0.22.dev0 documentation

KMeans produced inconsistent results between n_jobs=1 and n_jobs>1 due to the ... subsample to preserve the class balance of the original training set....

Inconsistency in sklearn predict function for 'ovr' multi-class ...

SVC function for a multiclass prediction problem (see plot below). ... did the prediction instead with the argmax of the decision function.

DOC add docstring to FunctionTransformer (#9058) - Scikit ...

FIX raise an error if n_quantiles > subsample ... add regression tests for OVO and OVR decision function shapes ... Fixing docstring inconsistency....