Assymetry of roc_auc_score
See original GitHub issueDescription
For binary tasks with wrapped down custom scoring functions by the metrics.make_scorerer
-scheme roc_auc_score
is behaving unexpectedly. When requiring the probability output from a binary classifier, which is a shape( n , 2) object, while the training/testing lables are an expected shape (n, ) input, A scoring will fail.
However, the binary task and internal handling of different y shapes is incidentially correctly understood by metrics.log_loss
by internal evaluations, but roc_auc_score
currently fails at this. This especially cumbersome, if the scoring function is wrapped down in a cross_val_score
and a make_scorer
much deeper in the code with possible nested pipelines etc, where the automatic correct evaluation of this particular metric is required.
Steps/Code to Reproduce
This should illustrate what is failing
from sklearn.metrics import roc_auc_score, log_loss
y_true0 = np.array([False, False, True, True])
y_true1 = ~y_true0
y_true = np.matrix([y_true0, y_true1]).T
y_proba0 = np.array([0.1, 0.4, 0.35, 0.8]) #predict_proba component [:,0]
y_proba1 = 1 - y_proba0 #predict_proba component [:,1]
y_proba = np.matrix([y_proba0, y_proba1]).T #as obtained by classifier.predict_proba()
log_loss(y_true1, y_proba1) # compute for positive class component >>> OK
log_loss(y_true, y_proba) # compute for all class >>> OK
log_loss(y_true1, y_proba) # compute for mixed component >>> OK
roc_auc_score(y_true1, y_proba1) # compute for positive class component >>> OK
roc_auc_score(y_true, y_proba) # compute for all class >>> OK
roc_auc_score(y_true1, y_proba) # compute for mixed component >>> FAIL: bad input shape (4, 2)
#last above line is source of error in this snippet: of binary classification and scoring task
from sklearn.datasets import make_hastie_10_2
from sklearn.ensemble import GradientBoostingClassifier
X, y = make_hastie_10_2(random_state=0)
X_train, X_test = X[:2000], X[2000:]
y_train, y_test = y[:2000], y[2000:]
clf = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0,
max_depth=1, random_state=0).fit(X_train, y_train)
from sklearn.metrics import make_scorer, roc_auc_score
make_scorer(roc_auc_score, needs_proba=True)(clf, X_test, y_test) # >>> FAIL: bad input shape (1000, 2)
#compare
make_scorer(log_loss, greater_is_better=True, needs_proba=True)(clf, X_test, y_test) # >>> OK
Expected Results
roc_auc_score
should behave in a similar way as log_loss
, guessing the binary classification task and handle different shape input correctly
Versions
Windows-10-10.0.15063-SP0 Python 3.6.1 |Anaconda 4.4.0 (64-bit)| (default, May 11 2017, 13:25:24) [MSC v.1900 64 bit (AMD64)] NumPy 1.13.3 SciPy 0.19.1 Scikit-Learn 0.19.1
Issue Analytics
- State:
- Created 6 years ago
- Comments:10 (7 by maintainers)
Top GitHub Comments
Yes,
roc_auc_score(y_true1, y_proba)
fails, butmake_scorer(roc_auc_score, needs_proba=True)(clf, X_test, y_test)
does not. Both are in accordance with documented and expected behaviour.I just figured out the problem. Because I didn’t notice that there are two columns in the result of predict_proba(), so the result cannot apply to roc_curve(). The problem was solved by selecting the second column of the predict_proba() as y_score. Thanks!