Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

add details on how to use both `sample_weight` and `precompute` together for linear models

See original GitHub issue

Describe the issue linked to the documentation

Currently it is unclear from the documentation on how the sample_weight argument to fit() interacts with precompute in the case that the user wants to pass in a precomputed Gram matrix. When these two arguments are used together it requires carefully preprocessing the data to replicate the steps performed in _pre_fit.

Here is a snippet of code demonstrating how to do it:

from sklearn.linear_model import ElasticNet
from sklearn.datasets import make_regression
from numpy.testing import assert_almost_equal
import numpy as np

X, y = make_regression(n_samples=int(1e5), noise=0.5)

# random lognormal weight vector.
weights = np.random.lognormal(size=y.shape)

en = ElasticNet(alpha=0.01, fit_intercept=True, normalize=False, precompute=False)
en.fit(X, y, sample_weight=weights)

X_c = (X - np.average(X, axis=0, weights=weights))
# row wise multiply
X_r = X_c * np.sqrt(weights)[:, np.newaxis]

en_precompute = ElasticNet(alpha=0.01, fit_intercept=True, normalize=False, precompute=X_r.T@X_r)
en_precompute.fit(X_c, y, sample_weight=weights)

assert_almost_equal(en.coef_, en_precompute.coef_)

Suggest a potential alternative/fix

Perhaps a section could be added to the user guide (suggested by @ogrisel on Gitter) on how to use these features together, and then that could be referenced from the docstring of the various models that take a precompute parameter in their constructors. @ogrisel also suggested adding a unit test (perhaps adapted from the above snippet) to make sure that this way of combining the two features isn’t inadvertently broken in the future.

Issue Analytics

State:
Created 3 years ago
Comments:6 (6 by maintainers)

Top GitHub Comments

1reaction

agramfortcommented, Dec 9, 2020

awesome @agramfort, I’m happy to turn this in to an example - would this go in this directory: https://github.com/scikit-learn/scikit-learn/tree/master/examples/miscellaneous?

ok in linear models

re: checking an element of the matrix, I guess I’d be a bit worried that it would give a false sense of security without being really guaranteed to catch a user error.

it’s a trade off.

0reactions

cmarmocommented, Jan 18, 2021

Fixed by #19004.