question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

A common private module for differentiable loss functions used as objective functions in estimators

See original GitHub issue

Currently each model that needs it defines their own losses, it might be useful to put them all in one place to see if anything could be re-used in the future.

In particular, losses are defined in,

  • HistGradientBoosting: sklearn/ensemble/_hist_gradient_boosting/loss.py
  • GradientBoosting: sklearn/ensemble/_gb_losses.py
  • SGD: sklearn/linear_model/sgd_fast.pyx
  • GLM: https://github.com/scikit-learn/scikit-learn/pull/14300 would add sklearn/linear_model/_glm/distributions.py
  • MLP: sklearn/neural_network/_base.py:LOSS_FUNCTIONS
  • losses implementations in sklearn.metrics
  • and somewhat related RandomForest: sklearn/tree/_criterion.pyx

This issues proposed to progressively put most of them (except for RF criterions) in a single private folder sklearn/_loss to see what could be made less redundant there.

In particular, for the GLM PR in https://github.com/scikit-learn/scikit-learn/pull/1430 this would allow breaking circular dependency between the sklearn.linear_model and sklearn.metrics modules https://github.com/scikit-learn/scikit-learn/pull/14300#discussion_r329370776

@NicolasHug I don’t know if you already had some plans about unifying new and legacy gradient boosting losses and if this would go in the right direction. cc @ogrisel

Weakly related to https://github.com/scikit-learn/scikit-learn/issues/5044 https://github.com/scikit-learn/scikit-learn/issues/3481 https://github.com/scikit-learn/scikit-learn/issues/5368

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:34 (34 by maintainers)

github_iconTop GitHub Comments

2reactions
lorentzenchrcommented, Nov 24, 2020

So let’s play a little IT architect. Current use of loss functions (for single targets).

1. HistGradientBoosting

files in sklearn.ensemble._hist_gradient_boosting: loss.py, _loss.pyx

  • def __call__(self, y_true, raw_predictions, sample_weight) -> float: Computes the weighted average loss.
  • def pointwise_loss(self, y_true, raw_predictions) -> ndarray: Computes the per sample losses.
  • def update_gradients_and_hessians(self, gradients, hessians, y_true, raw_predictions, sample_weight) -> None: Computes gradient and hessian and writes them into float32 ndarrays gradient and hessian. Implemented in cython with loop over samples for i in prange(n_samples, schedule='static', nogil=True):
  • get_baseline_prediction(self, y_train, sample_weight, prediction_dim) -> float: Compute initial estimates for “raw_prediction”, i.e. a intercept-only linear model.
  • Knows the notion of a link function.

2. GradientBoosting

file sklearn/ensemble/_gb_losses.py

  • def __call__(self, y, raw_predictions, sample_weight=None) -> float: Computes the weighted average loss.
  • def negative_gradient(self, y, raw_predictions, **kargs) -> ndarray:
  • def init_estimator(self) -> estimator: Sets a default estimator for initialization of “raw_prediction” via get_init_raw_predictions, which in turn just calls estimator.predict(X).
  • No hessians.
  • Some more functions specific to gradient boosted trees.

3. SGD

file sklearn/linear_model/_sgd_fast.pyx

  • Everything in cython in an extension type (cdef class).
  • cdef double loss(self, double p, double y) nogil: Compute loss for one single sample.
  • cdef double dloss(self, double p, double y) nogil: Compute gradient for one single sample. Note: Python version py_dloss calls dloss. Only used for testing.
  • def dloss(self, double p, double y): return self._dloss(p, y)
  • No hessians.

4. GLM

To make it short, gradients, hessians and link function as for HistGradientBoosting are sufficient.

5. MLP

file sklearn/neural_network/_base.py

  • def squared_loss(y_true, y_pred) -> float: Computes the per sample losses. Several such functions, one for each loss squared_loss, log_loss and binary_log_loss.
  • Gradients are (by chance 😏) hard coded in _backprop as y_pred - y_true.
  • No hessians.
  • Knows the notion of a link function (here “activation” function) and its derivatives, e.g. logistic(X) and inplace_logistic_derivative(Z, delta).

6. Trees

  • Everything in cython.
  • cdef double proxy_impurity_improvement(self) nogil: Computes proxy of loss improvement of possible splits.
  • cdef void children_impurity(self, double* impurity_left, double* impurity_right) nogil: Computes the loss improvement of a splits.
  • No gradients nor hessians.

This post might be updated in the future.

1reaction
lorentzenchrcommented, Nov 27, 2021

We now have #20567 merged. Let’s close this issue when some modules make use of it, e.g. (histogram) gradient bossting and linear models.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Estimators, Loss Functions, Optimizers —Core of ML Algorithms
MSE loss function is widely used in linear regression as the performance measure. To calculate MSE, you take the difference between your ...
Read more >
Machine Learning with Gradient-Based Optimization of ... - MDPI
As the prediction function is differentiable, this model can be used for gradient-based optimization, and act as an objective function, ...
Read more >
The 7 Most Common Machine Learning Loss Functions - Built In
The loss function is a method of evaluating how well your machine learning algorithm models your featured data set. In other words, loss ......
Read more >
A review of uncertainty quantification in deep learning
Uncertainty quantification (UQ ) methods play a pivotal role in reducing the impact of uncertainties during both optimization and decision making processes.
Read more >
From calibration to parameter learning - Nature
The overall dPL framework contains a parameter estimation module (Fig. ... First, the loss function is defined over the entire training ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found