Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

set_params() changing the result of transform / predict without fitting

See original GitHub issue

I came across this thing

from sklearn.feature_selection import SelectFromModel
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression

X, y = make_classification(n_samples=1000, random_state=0)

est = SelectFromModel(LogisticRegression(), threshold=.4)
est.fit(X, y)
print(est.transform(X).shape)  # 1000, 3
est.threshold = .01
print(est.transform(X).shape)  # 1000, 20

where the output of transform was changed by just setting an __init__ parameter.

Is this a bug or a feature? If it’s a bug, it’s easy to fix. If it’s not, then I have a few concerns:

This typically isn’t possible for any parameter than needs to be validated in fit, so we can’t support this pattern consistently
We don’t do that in e.g. PCA where it would be in theory possible, and quite useful. I.e. users might want to transform to different values of n_components without having to refit all the time.

Issue Analytics

State:
Created 3 years ago
Comments:9 (9 by maintainers)

Top GitHub Comments

1reaction

NicolasHugcommented, May 29, 2020

I’m fine with keeping it where it’s currently supported as long as we don’t make it a contract and allow ourselves to break backward compatibility.

I’m thinking of writing a doc section where we discuss such caveats. I think this one and the random_state handling are good candidates. Will try to submit a PR soon.

We don’t seem to have a good way for users to configure the warnings they see if you ask me.

Agreed. Now that we issue FutureWarnings and don’t tweak warning filters, it might be easier to reboot that logger idea you’ve been advocating for?

0reactions

adrinjalalicommented, May 29, 2020

Thinking about it, I think setting the parameters and not calling fit is advanced usecase and shouldn’t be prevented. But I’d be happy if we warn about it, and have an easy way to disable those warnings for advanced users? We don’t seem to have a good way for users to configure the warnings they see if you ask me.

Top Results From Across the Web

python - set_params() in sklean pipeline not working with ...

I would like to make a prediction of a single tree of my random forest. However, if I wrap my pipeline around TransformedTargetRegressor...

Developing scikit-learn estimators

The fit() method takes the training data as arguments, which can be one array ... that the estimator expects for subsequent calls to...

Python API Reference — xgboost 1.7.2 documentation

This parameter replaces eval_metric in fit() method. The old one receives un-transformed prediction regardless of whether custom objective is being used. from ...

Pipelines & Custom Transformers in Scikit-learn

It is only data in, prediction out. ... X = X.to_numpy() ... In the transform method, we apply the parameters learned in fit...

pyspark.ml package — PySpark 2.2.0 documentation

If a stage is a Transformer, its Transformer.transform() method will be called to produce the dataset for the next stage. The fitted model...