RFC classifiers trained by minimizing the Brier loss
See original GitHub issueAt the moment, our probablistic classifiers (e.g. logistic regression and gradient boosted trees) optimize the log loss, typically after taking a sigmoid or softmax inverse link function (typically part of the Cython loss).
However the log-loss is not the only proper-scoring rule to fit estimators of the expected conditional class probabilities, in particular the Brier loss is also a proper scoring rule and has the practical advantage of being upper bounded which should limit the impact of mislabeled examples in the training set.
Would it make sense to publicly such estimators in scikit-learn?
Note that for linear models, fitting the Brier loss is not equivalent to our RidgeClassifier
because the latter does not take the softmax of its raw predictions before computing the square loss, hence it does not offer a well defined estimation of the conditional class probabilities.
Minimizing the Brier loss of a softmax linear model is no longer a convex optimization problem but I doubt that this would prevent a newton solver with a robust line search to converge in practice.
Issue Analytics
- State:
- Created a year ago
- Comments:10 (10 by maintainers)
Top GitHub Comments
Pretty much any GLM literature (score equation for general links and canonical links is the key):
For probabilistic classification in particular (no GLM context)
The last point about efficiency (for fitting) is a property of Maximum Likelihood Estimation Theory (achieving the Cramer-Rao lower bound asymptotically).
That’s RidgeClassifier, isn’t it?Ooops, didn’t read well.I’d say: is there a good literature documenting practice (and theory) showing benefit of such classifiers. If not, I wouldn’t prioritize this: we have a lot of things on our plate already.