Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

A common private module for differentiable loss functions used as objective functions in estimators

See original GitHub issue

Currently each model that needs it defines their own losses, it might be useful to put them all in one place to see if anything could be re-used in the future.

In particular, losses are defined in,

HistGradientBoosting: sklearn/ensemble/_hist_gradient_boosting/loss.py
GradientBoosting: sklearn/ensemble/_gb_losses.py
SGD: sklearn/linear_model/sgd_fast.pyx
GLM: https://github.com/scikit-learn/scikit-learn/pull/14300 would add sklearn/linear_model/_glm/distributions.py
MLP: sklearn/neural_network/_base.py:LOSS_FUNCTIONS
losses implementations in sklearn.metrics
and somewhat related RandomForest: sklearn/tree/_criterion.pyx

This issues proposed to progressively put most of them (except for RF criterions) in a single private folder sklearn/_loss to see what could be made less redundant there.

In particular, for the GLM PR in https://github.com/scikit-learn/scikit-learn/pull/1430 this would allow breaking circular dependency between the sklearn.linear_model and sklearn.metrics modules https://github.com/scikit-learn/scikit-learn/pull/14300#discussion_r329370776

@NicolasHug I don’t know if you already had some plans about unifying new and legacy gradient boosting losses and if this would go in the right direction. cc @ogrisel

Issue Analytics

State:
Created 4 years ago
Comments:34 (34 by maintainers)

Top GitHub Comments

2reactions

lorentzenchrcommented, Nov 24, 2020

So let’s play a little IT architect. Current use of loss functions (for single targets).

1. HistGradientBoosting

files in sklearn.ensemble._hist_gradient_boosting: loss.py, _loss.pyx

def __call__(self, y_true, raw_predictions, sample_weight) -> float: Computes the weighted average loss.
def pointwise_loss(self, y_true, raw_predictions) -> ndarray: Computes the per sample losses.
def update_gradients_and_hessians(self, gradients, hessians, y_true, raw_predictions, sample_weight) -> None: Computes gradient and hessian and writes them into float32 ndarrays gradient and hessian. Implemented in cython with loop over samples for i in prange(n_samples, schedule='static', nogil=True):
get_baseline_prediction(self, y_train, sample_weight, prediction_dim) -> float: Compute initial estimates for “raw_prediction”, i.e. a intercept-only linear model.
Knows the notion of a link function.

2. GradientBoosting

file sklearn/ensemble/_gb_losses.py

def __call__(self, y, raw_predictions, sample_weight=None) -> float: Computes the weighted average loss.
def negative_gradient(self, y, raw_predictions, **kargs) -> ndarray:
def init_estimator(self) -> estimator: Sets a default estimator for initialization of “raw_prediction” via get_init_raw_predictions, which in turn just calls estimator.predict(X).
No hessians.
Some more functions specific to gradient boosted trees.

3. SGD

file sklearn/linear_model/_sgd_fast.pyx

Everything in cython in an extension type (cdef class).
cdef double loss(self, double p, double y) nogil: Compute loss for one single sample.
cdef double dloss(self, double p, double y) nogil: Compute gradient for one single sample. Note: Python version py_dloss calls dloss. Only used for testing.
def dloss(self, double p, double y): return self._dloss(p, y)
No hessians.

4. GLM

To make it short, gradients, hessians and link function as for HistGradientBoosting are sufficient.

5. MLP

file sklearn/neural_network/_base.py

def squared_loss(y_true, y_pred) -> float: Computes the per sample losses. Several such functions, one for each loss squared_loss, log_loss and binary_log_loss.
Gradients are (by chance 😏) hard coded in _backprop as y_pred - y_true.
No hessians.
Knows the notion of a link function (here “activation” function) and its derivatives, e.g. logistic(X) and inplace_logistic_derivative(Z, delta).

6. Trees

Everything in cython.
cdef double proxy_impurity_improvement(self) nogil: Computes proxy of loss improvement of possible splits.
cdef void children_impurity(self, double* impurity_left, double* impurity_right) nogil: Computes the loss improvement of a splits.
No gradients nor hessians.

This post might be updated in the future.

1reaction

lorentzenchrcommented, Nov 27, 2021

We now have #20567 merged. Let’s close this issue when some modules make use of it, e.g. (histogram) gradient bossting and linear models.

Top Results From Across the Web

Estimators, Loss Functions, Optimizers —Core of ML Algorithms

MSE loss function is widely used in linear regression as the performance measure. To calculate MSE, you take the difference between your ...

Machine Learning with Gradient-Based Optimization of ... - MDPI

As the prediction function is differentiable, this model can be used for gradient-based optimization, and act as an objective function, ...

The 7 Most Common Machine Learning Loss Functions - Built In

The loss function is a method of evaluating how well your machine learning algorithm models your featured data set. In other words, loss ......

A review of uncertainty quantification in deep learning

Uncertainty quantification (UQ ) methods play a pivotal role in reducing the impact of uncertainties during both optimization and decision making processes.

From calibration to parameter learning - Nature

The overall dPL framework contains a parameter estimation module (Fig. ... First, the loss function is defined over the entire training ...