SVC and OneVsOneClassifier decision_function inconsistent on sub-sample
See original GitHub issueHi,
I’m seeing inconsistent numerical results with SVC’s decision_function. When estimated over an entire batch of samples ( (n_samples, n_features) matrix ) compared to analyzing sample-by-sample, the results are not the same. This is true for both the individual numerical values per sample and the overall distribution of the results.
The model is SVC with RBF kernel, for a 3-class classification:
SVC(C=1.0, gamma=0.007, class_weight = new_class_weight, probability = True, random_state = 30,
decision_function_shape = 'ovr')
The models are loaded from file:
ML = joblib.load("model.pkl")
Option A, analyze a matrix:
distances = ML.decision_function(X)
Option B, analyze individual samples:
distances = numpy.zeros([X.shape[0], 3])
for i in range(X.shape[0]):
distances[i,:]` = ML.decision_function(X[i,:].reshape(1,-1))
Output for first two samples: Option A: sample 1: [ 0.90835588, -0.17305875, 2.26470288] sample 2: [ 1.10437313, -0.2371539 , 2.13278077]
Option B: sample 1: [ 0.82689247, -0.32689247, 2.5 ] sample 2: [ 1.22005359, -0.5 , 2.27994641]
I couldn’t find any indication for this behavior in the documentation.
Windows-10-10.0.15063-SP0 Python 3.5.2 |Anaconda 4.2.0 (64-bit)| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)] NumPy 1.12.1 SciPy 0.18.1 Scikit-Learn 0.18.1
Thanks!
Issue Analytics
- State:
- Created 6 years ago
- Comments:17 (14 by maintainers)
Top GitHub Comments
Here you go:
yes, I suppose it checks the main properties on a small sample. It assumes you can’t control the ovo output, while testing the helper directly would allow us to do so and hence be more rigorous.