A common private module for differentiable loss functions used as objective functions in estimators
See original GitHub issueCurrently each model that needs it defines their own losses, it might be useful to put them all in one place to see if anything could be re-used in the future.
In particular, losses are defined in,
- HistGradientBoosting:
sklearn/ensemble/_hist_gradient_boosting/loss.py
- GradientBoosting:
sklearn/ensemble/_gb_losses.py
- SGD:
sklearn/linear_model/sgd_fast.pyx
- GLM: https://github.com/scikit-learn/scikit-learn/pull/14300 would add
sklearn/linear_model/_glm/distributions.py
- MLP:
sklearn/neural_network/_base.py:LOSS_FUNCTIONS
- losses implementations in
sklearn.metrics
- and somewhat related RandomForest:
sklearn/tree/_criterion.pyx
This issues proposed to progressively put most of them (except for RF criterions) in a single private folder sklearn/_loss
to see what could be made less redundant there.
In particular, for the GLM PR in https://github.com/scikit-learn/scikit-learn/pull/1430 this would allow breaking circular dependency between the sklearn.linear_model
and sklearn.metrics
modules https://github.com/scikit-learn/scikit-learn/pull/14300#discussion_r329370776
@NicolasHug I don’t know if you already had some plans about unifying new and legacy gradient boosting losses and if this would go in the right direction. cc @ogrisel
Weakly related to https://github.com/scikit-learn/scikit-learn/issues/5044 https://github.com/scikit-learn/scikit-learn/issues/3481 https://github.com/scikit-learn/scikit-learn/issues/5368
Issue Analytics
- State:
- Created 4 years ago
- Comments:34 (34 by maintainers)
So let’s play a little IT architect. Current use of loss functions (for single targets).
1. HistGradientBoosting
files in sklearn.ensemble._hist_gradient_boosting: loss.py, _loss.pyx
def __call__(self, y_true, raw_predictions, sample_weight) -> float:
Computes the weighted average loss.def pointwise_loss(self, y_true, raw_predictions) -> ndarray:
Computes the per sample losses.def update_gradients_and_hessians(self, gradients, hessians, y_true, raw_predictions, sample_weight) -> None:
Computes gradient and hessian and writes them into float32 ndarraysgradient
andhessian
. Implemented in cython with loop over samplesfor i in prange(n_samples, schedule='static', nogil=True):
get_baseline_prediction(self, y_train, sample_weight, prediction_dim) -> float:
Compute initial estimates for “raw_prediction”, i.e. a intercept-only linear model.link
function.2. GradientBoosting
file sklearn/ensemble/_gb_losses.py
def __call__(self, y, raw_predictions, sample_weight=None) -> float:
Computes the weighted average loss.def negative_gradient(self, y, raw_predictions, **kargs) -> ndarray:
def init_estimator(self) -> estimator:
Sets a default estimator for initialization of “raw_prediction” viaget_init_raw_predictions
, which in turn just callsestimator.predict(X)
.3. SGD
file sklearn/linear_model/_sgd_fast.pyx
cdef double loss(self, double p, double y) nogil:
Compute loss for one single sample.cdef double dloss(self, double p, double y) nogil:
Compute gradient for one single sample. Note: Python version py_dloss calls dloss. Only used for testing.def dloss(self, double p, double y):
return self._dloss(p, y)
4. GLM
To make it short, gradients, hessians and link function as for HistGradientBoosting are sufficient.
5. MLP
file sklearn/neural_network/_base.py
def squared_loss(y_true, y_pred) -> float:
Computes the per sample losses. Several such functions, one for each losssquared_loss
,log_loss
andbinary_log_loss
._backprop
asy_pred - y_true
.link
function (here “activation” function) and its derivatives, e.g.logistic(X)
andinplace_logistic_derivative(Z, delta)
.6. Trees
cdef double proxy_impurity_improvement(self) nogil:
Computes proxy of loss improvement of possible splits.cdef void children_impurity(self, double* impurity_left, double* impurity_right) nogil:
Computes the loss improvement of a splits.This post might be updated in the future.
We now have #20567 merged. Let’s close this issue when some modules make use of it, e.g. (histogram) gradient bossting and linear models.