set_params() changing the result of transform / predict without fitting
See original GitHub issueI came across this thing
from sklearn.feature_selection import SelectFromModel
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
X, y = make_classification(n_samples=1000, random_state=0)
est = SelectFromModel(LogisticRegression(), threshold=.4)
est.fit(X, y)
print(est.transform(X).shape) # 1000, 3
est.threshold = .01
print(est.transform(X).shape) # 1000, 20
where the output of transform was changed by just setting an __init__
parameter.
Is this a bug or a feature? If it’s a bug, it’s easy to fix. If it’s not, then I have a few concerns:
- This typically isn’t possible for any parameter than needs to be validated in
fit
, so we can’t support this pattern consistently - We don’t do that in e.g. PCA where it would be in theory possible, and quite useful. I.e. users might want to transform to different values of n_components without having to refit all the time.
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (9 by maintainers)
Top Results From Across the Web
python - set_params() in sklean pipeline not working with ...
I would like to make a prediction of a single tree of my random forest. However, if I wrap my pipeline around TransformedTargetRegressor...
Read more >Developing scikit-learn estimators
The fit() method takes the training data as arguments, which can be one array ... that the estimator expects for subsequent calls to...
Read more >Python API Reference — xgboost 1.7.2 documentation
This parameter replaces eval_metric in fit() method. The old one receives un-transformed prediction regardless of whether custom objective is being used. from ...
Read more >Pipelines & Custom Transformers in Scikit-learn
It is only data in, prediction out. ... X = X.to_numpy() ... In the transform method, we apply the parameters learned in fit...
Read more >pyspark.ml package — PySpark 2.2.0 documentation
If a stage is a Transformer, its Transformer.transform() method will be called to produce the dataset for the next stage. The fitted model...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’m fine with keeping it where it’s currently supported as long as we don’t make it a contract and allow ourselves to break backward compatibility.
I’m thinking of writing a doc section where we discuss such caveats. I think this one and the random_state handling are good candidates. Will try to submit a PR soon.
Agreed. Now that we issue
FutureWarning
s and don’t tweak warning filters, it might be easier to reboot that logger idea you’ve been advocating for?Thinking about it, I think setting the parameters and not calling fit is advanced usecase and shouldn’t be prevented. But I’d be happy if we warn about it, and have an easy way to disable those warnings for advanced users? We don’t seem to have a good way for users to configure the warnings they see if you ask me.