Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

API Inconsitency of predict and predict_proba in SVC

See original GitHub issue

When using SVC(probability=True) ~~or SVR(probability=True)~~ the output of predict_proba will not necessarily be consistent with predict, in the sense that,

np.argmax(self.predict_proba(X), axis=1) != self.predict(X)

this is documented in the user guide,

In addition, the probability estimates may be inconsistent with the scores, in the sense that the “argmax” of the scores may not be the argmax of the probabilities. (E.g., in binary classification, a sample may be labeled by predict as belonging to a class that has probability <½ according to predict_proba.) Platt’s method is also known to have theoretical issues.

IMO this is a violation of the API contract and should be fixed.

This is being continuously reported as a bug e.g. https://github.com/scikit-learn/scikit-learn/issues/4800 https://github.com/scikit-learn/scikit-learn/issues/12408 https://github.com/scikit-learn/scikit-learn/issues/12982 and a few stack overflow issues e.g. https://stackoverflow.com/a/17019830

I encountered this in a project where detecting this discrepancy, evaluating the difference and deciding whether predict or argmax(predict_proba should be used in the end took some effort.

One pitfall is for instance to use predict to compute the accuracy, and then predict_proba for ROC AUC which can lead to somewhat problematic results if the predictions of these methods are not consistent.

Several approaches could be used to fix it,

Deprecate probability=True parameter in SVC, NuSVC estimator and suggest using CalibratedClassifierCV(SVC(), cv=5) instead. In my quick tests (on sparse data), the latter was actually faster and should yield comparable results that are also consistent between predict and predict_proba. Though more benchmarks may be needed. There may also be some variation in the results, as libsvm uses a generalization of Platt scaling in the multiclass case by Wu et al 2014 (cf docs that is not used in CalibratedClassifierCV as far as I understand?

One possibility could be to deprecate, but keep it to allow access to that functionality in libsvm.

Compute predict as argmax(predict_proba when probability=True. This has the disadvantage of changing the results of predict depending on this input parameter.
Dig into libsvm to understand how it could be fixed there.

Issue Analytics

State:
Created 5 years ago
Reactions:6
Comments:44 (38 by maintainers)

Top GitHub Comments

5reactions

rthcommented, Feb 21, 2019

Here is a minimal example,

import numpy as np

from sklearn.datasets import fetch_20newsgroups

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC

ds = fetch_20newsgroups()

X = TfidfVectorizer(min_df=5, max_df=0.7, stop_words="english", norm="l2").fit_transform(ds.data)

X_train, X_test, y_train, y_test = train_test_split(
    X, ds.target, train_size=1000, test_size=None, random_state=42)


estimator = SVC(probability=True, kernel='linear', random_state=42)
estimator.fit(X_train, y_train)

y_pred_1 = estimator.predict(X_test)
y_pred_2 = estimator.predict_proba(X_test).argmax(axis=1)

print("On {} test documents, {} have an inconsistent predict and predict_proba "
      .format(len(y_pred_1), (y_pred_1 != y_pred_2).sum()))

which returns,

On 10314 test documents, 908 have an inconsistent predict and predict_proba

Not that using SVC in this use case is good but that’s another story (https://github.com/scikit-learn/scikit-learn/pull/13209)…

4reactions

rthcommented, Mar 30, 2020

Putting aside reports of inconsistent results with probability=True which, if true means something is likely wrong in libsvm, is an orthogonal issue and should be addressed in https://github.com/scikit-learn/scikit-learn/issues/13662. This issue is about purely API consistency and decoupling of probability=True|False and consistency of predict and predict_proba.

To re-iterate; the fact that predict and predict_proba can be inconsistent is IMO a bug, that breaks the API expectations and no amount of documentation is sufficient to fix it, in my opinion. We have a common test for this that passes because this issue happens only occasionally.

So the choices (adapted from the initial issue description) could be, a) deprecate probability=True and predict_proba, then suggest using CalibratedClassifierCV + SVC. This is bound to make users unhappy, who are currently using this option. b) when probabilty=True, compute predict as argmax of predict_proba. It could have been the solution except that it silently breaks backward compatibility. And I guess a lot of users run SVC(probability=True) in production – we can’t make this change silently. c) deprecate probability=True and predict_proba, then add e.g. CalibratedSVC and CalibratedNuSVC classes (ending with CV could have been better, but harder to read) that behave as SVC(probability=True) and where predict is computed from predict_proba. d) deprecate probability=True and replace it with probability='calibrated' for which predict is computed from predict_proba.

I would probably vote for d) unless we are OK introducing 2 other classes in c).

Thoughts @amueller ?

Top Results From Across the Web

Why is the result of sklearn.svm.SVC.predict() inconsistent ...

predict_proba() shows that the instance should belong to class0 with the highest probability. But svc.predict() says class2 instead. I wonder ...

cuML API Reference — cuml 22.12.00 documentation

Predicts the log class probabilities for each class in X. predict_proba (self, X[, convert_dtype]). Predicts the class probabilities for each class in X....

Difference Between predict and predict_proba in scikit-learn

In today's article we will discuss how to use predict and predict_proba methods over a dataset in order to perform predictions.

Why The Predict_Proba Function Of Sklearn.Svm.Svc Is ...

I am using its predictproba function to get probability estimates. ... sklearn svm bad with API Inconsitency of predict and predictproba in SVC...

API design for machine learning software - Mathieu Blondel

Some unsupervised learning estimators may also implement the predict in- terface. The code in the snippet below fits a k-means model with k...