add details on how to use both `sample_weight` and `precompute` together for linear models
See original GitHub issueDescribe the issue linked to the documentation
Currently it is unclear from the documentation on how the sample_weight
argument to fit()
interacts with precompute
in the case that the user wants to pass in a precomputed Gram matrix. When these two arguments are used together it requires carefully preprocessing the data to replicate the steps performed in _pre_fit
.
Here is a snippet of code demonstrating how to do it:
from sklearn.linear_model import ElasticNet
from sklearn.datasets import make_regression
from numpy.testing import assert_almost_equal
import numpy as np
X, y = make_regression(n_samples=int(1e5), noise=0.5)
# random lognormal weight vector.
weights = np.random.lognormal(size=y.shape)
en = ElasticNet(alpha=0.01, fit_intercept=True, normalize=False, precompute=False)
en.fit(X, y, sample_weight=weights)
X_c = (X - np.average(X, axis=0, weights=weights))
# row wise multiply
X_r = X_c * np.sqrt(weights)[:, np.newaxis]
en_precompute = ElasticNet(alpha=0.01, fit_intercept=True, normalize=False, precompute=X_r.T@X_r)
en_precompute.fit(X_c, y, sample_weight=weights)
assert_almost_equal(en.coef_, en_precompute.coef_)
Suggest a potential alternative/fix
Perhaps a section could be added to the user guide (suggested by @ogrisel on Gitter) on how to use these features together, and then that could be referenced from the docstring of the various models that take a precompute
parameter in their constructors. @ogrisel also suggested adding a unit test (perhaps adapted from the above snippet) to make sure that this way of combining the two features isn’t inadvertently broken in the future.
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (6 by maintainers)
Top GitHub Comments
ok in linear models
it’s a trade off.
Fixed by #19004.