Incorrect predictions when fitting a LogisticRegression model on binary outcomes with `multi_class='multinomial'`.
See original GitHub issueDescription
Incorrect predictions when fitting a LogisticRegression model on binary outcomes with multi_class='multinomial'
.
Steps/Code to Reproduce
from sklearn.linear_model import LogisticRegression
import sklearn.metrics
import numpy as np
# Set up a logistic regression object
lr = LogisticRegression(C=1000000, multi_class='multinomial',
solver='sag', tol=0.0001, warm_start=False,
verbose=0)
# Set independent variable values
Z = np.array([
[ 0. , 0. ],
[ 1.33448632, 0. ],
[ 1.48790105, -0.33289528],
[-0.47953866, -0.61499779],
[ 1.55548163, 1.14414766],
[-0.31476657, -1.29024053],
[-1.40220786, -0.26316645],
[ 2.227822 , -0.75403668],
[-0.78170885, -1.66963585],
[ 2.24057471, -0.74555021],
[-1.74809665, 2.25340192],
[-1.74958841, 2.2566389 ],
[ 2.25984734, -1.75106702],
[ 0.50598996, -0.77338402],
[ 1.21968303, 0.57530831],
[ 1.65370219, -0.36647173],
[ 0.66569897, 1.77740068],
[-0.37088553, -0.92379819],
[-1.17757946, -0.25393047],
[-1.624227 , 0.71525192]])
# Set dependant variable values
Y = np.array([1, 0, 0, 1, 0, 0, 0, 0,
0, 0, 1, 1, 1, 0, 0, 1,
0, 0, 1, 1], dtype=np.int32)
lr.fit(Z, Y)
p = lr.predict_proba(Z)
print(sklearn.metrics.log_loss(Y, p)) # ...
print(lr.intercept_)
print(lr.coef_)
Expected Results
If we compare against R or using multi_class='ovr'
, the log loss (which is approximately proportional to the objective function as the regularisation is set to be negligible through the choice of C
) is incorrect. We expect the log loss to be roughly 0.5922995
Actual Results
The actual log loss when using multi_class='multinomial'
is 0.61505641264
.
Further Information
See the stack exchange question https://stats.stackexchange.com/questions/306886/confusing-behaviour-of-scikit-learn-logistic-regression-multinomial-optimisation?noredirect=1#comment583412_306886 for more information.
The issue it seems is caused in https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/linear_model/logistic.py#L762. In the multinomial
case even if classes.size==2
we cannot reduce to a 1D case by throwing away one of the vectors of coefficients (as we can in normal binary logistic regression). This is essentially a difference between softmax (redundancy allowed) and logistic regression.
This can be fixed by commenting out the lines 762 and 763. I am apprehensive however that this may cause some other unknown issues which is why I am positing as a bug.
Versions
Linux-4.10.0-33-generic-x86_64-with-Ubuntu-16.04-xenial Python 3.5.2 (default, Nov 17 2016, 17:05:23) NumPy 1.13.1 SciPy 0.19.1 Scikit-Learn 0.19.0
Issue Analytics
- State:
- Created 6 years ago
- Comments:18 (18 by maintainers)
Top GitHub Comments
So to sum-up, we need to:
predict_proba
as described in https://github.com/scikit-learn/scikit-learn/issues/9889#issuecomment-335129554coef_
's docstringwhats_new
Do you want to do it @rwolst ?
I think it would be excessive noise to warn such upon fit. why not just amend the coef_ description? most users will not be manually making probabilistic interpretations of coef_ in any case, and we can’t in general stop users misinterpreting things on the basis of assumption rather than reading the docs…