question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

set_params() changing the result of transform / predict without fitting

See original GitHub issue

I came across this thing

from sklearn.feature_selection import SelectFromModel
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression

X, y = make_classification(n_samples=1000, random_state=0)

est = SelectFromModel(LogisticRegression(), threshold=.4)
est.fit(X, y)
print(est.transform(X).shape)  # 1000, 3
est.threshold = .01
print(est.transform(X).shape)  # 1000, 20

where the output of transform was changed by just setting an __init__ parameter.

Is this a bug or a feature? If it’s a bug, it’s easy to fix. If it’s not, then I have a few concerns:

  • This typically isn’t possible for any parameter than needs to be validated in fit, so we can’t support this pattern consistently
  • We don’t do that in e.g. PCA where it would be in theory possible, and quite useful. I.e. users might want to transform to different values of n_components without having to refit all the time.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
NicolasHugcommented, May 29, 2020

I’m fine with keeping it where it’s currently supported as long as we don’t make it a contract and allow ourselves to break backward compatibility.

I’m thinking of writing a doc section where we discuss such caveats. I think this one and the random_state handling are good candidates. Will try to submit a PR soon.

We don’t seem to have a good way for users to configure the warnings they see if you ask me.

Agreed. Now that we issue FutureWarnings and don’t tweak warning filters, it might be easier to reboot that logger idea you’ve been advocating for?

0reactions
adrinjalalicommented, May 29, 2020

Thinking about it, I think setting the parameters and not calling fit is advanced usecase and shouldn’t be prevented. But I’d be happy if we warn about it, and have an easy way to disable those warnings for advanced users? We don’t seem to have a good way for users to configure the warnings they see if you ask me.

Read more comments on GitHub >

github_iconTop Results From Across the Web

python - set_params() in sklean pipeline not working with ...
I would like to make a prediction of a single tree of my random forest. However, if I wrap my pipeline around TransformedTargetRegressor...
Read more >
Developing scikit-learn estimators
The fit() method takes the training data as arguments, which can be one array ... that the estimator expects for subsequent calls to...
Read more >
Python API Reference — xgboost 1.7.2 documentation
This parameter replaces eval_metric in fit() method. The old one receives un-transformed prediction regardless of whether custom objective is being used. from ...
Read more >
Pipelines & Custom Transformers in Scikit-learn
It is only data in, prediction out. ... X = X.to_numpy() ... In the transform method, we apply the parameters learned in fit...
Read more >
pyspark.ml package — PySpark 2.2.0 documentation
If a stage is a Transformer, its Transformer.transform() method will be called to produce the dataset for the next stage. The fitted model...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found