question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

SGDClassifier never has the attribute "predict_proba" (even with log or modified_huber loss)

See original GitHub issue

Description

SGDClassifier’s predict_proba() is not compatible with MultiOutputClassifier’s predict_proba() (even when it has the proper loss functions: log or modified_huber).

The incompatibility occurs because estimators implementing SGDClassifier do not have the attribute “predict_proba”; thus, when wrapped by MultiOutputClassifier, predict_proba() raises an error. The error occurs in this file: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/multioutput.py

At this condition:

        if not hasattr(self.estimator, "predict_proba"):
            raise ValueError("The base estimator should implement"
                             "predict_proba method")

Just for the overly simplified example below, LogisticRegression classifiers do have the attribute, and those work correctly.

Steps/Code to Reproduce

from sklearn.linear_model import SGDClassifier as online
from sklearn.linear_model import LogisticRegression as log

# use either one because they allow predict_proba() with SGDClassifier alone:
clf_test = online(loss="log", penalty="l2")
#clf_test = online(loss="modified_huber", penalty="l2")

# The problematic condition in MultiOutputClassifier's predict_proba():
if not hasattr(clf_test, "predict_proba"):
    print("Don't allow predict_proba() when wrapped by MultiOutputClassifier.")
else:
    print("Allow predict_proba() when wrapped by MultiOutputClassifier.")

# By contrast, the logistic regression classifier would work.
clf_test = log()
if not hasattr(clf_test, "predict_proba"):
    print("Don't allow predict_proba() when wrapped by MultiOutputClassifier.")
else:
    print("Allow predict_proba() when wrapped by MultiOutputClassifier.")

Expected Results

Allow predict_proba() when wrapped by MultiOutputClassifier. Allow predict_proba() when wrapped by MultiOutputClassifier.

Actual Results

Don’t allow predict_proba() when wrapped by MultiOutputClassifier. Allow predict_proba() when wrapped by MultiOutputClassifier.

Versions

Windows-10-10.0.15063 (‘Python’, ‘2.7.11 |Anaconda custom (32-bit)| (default, Mar 4 2016, 15:18:41) [MSC v.1500 32 bit (Intel)]’) (‘NumPy’, ‘1.10.4’) (‘SciPy’, ‘0.17.0’) (‘Scikit-Learn’, ‘0.19.1’)

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:14 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
jnothmancommented, Nov 11, 2017

no, it’s checking at the right time, but checking the wrong thing.

0reactions
rebekahkimcommented, Feb 27, 2019

@jnothman, I think @TomDLT also suggested the same thing but as @amueller pointed out, it would still fail for certain (obscure) cases.

Here’s an edge case I came up with in #12222

Read more comments on GitHub >

github_iconTop Results From Across the Web

SGDClassifier with predict_proba - python - Stack Overflow
As you can see in doc - This method is only available for log loss and modified Huber loss. So you have to...
Read more >
1.5. Stochastic Gradient Descent - Scikit-learn
Stochastic Gradient Descent (SGD) is a simple yet very efficient approach to discriminative learning of linear classifiers under convex loss functions such as...
Read more >
scikit learn - Which algorithm is used in sklearn SGDClassifier ...
The 'log' loss gives logistic regression, a probabilistic classifier. 'modified_huber' is another smooth loss that brings tolerance to outliers ...
Read more >
scikit-learn user guide
documentation recommendation for libraries to leave the log message ... MultiOutputClassifier now has predict_proba as property and can be.
Read more >
Stochastic Gradient Descent - | notebook.community
discriminative learning of linear classifiers under convex loss functions such as (linear) Support Vector Machines and Logistic Regression. SGD has been ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found