Warn about using scale, min_max_scale, etc.
See original GitHub issueWe have a bunch of preprocessing helpers like scale
, quantile_transform
, robust_scale
etc.
While these can be useful, users misuse them and leak their training data, as recently illustrated in the mailing list.
I think we should:
- Add a
.. warning
note in their respective docstrings indicating the dangers of using these and recommending pipelines + estimators instead - Stop referencing them from the UG, and only add them e.g. in the “See Also” sections, or even just in the API ref.
Right now the first entry of the Preprocessing
guide is scale
. I think it should be StandardScaler
within a pipeline.
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (7 by maintainers)
Top Results From Across the Web
How to Use StandardScaler and MinMaxScaler Transforms in ...
In this tutorial, you will discover how to use scaler transforms to standardize and normalize numerical input variables for classification and ...
Read more >StandardScaler, MinMaxScaler, RobustScaler, Normalizer ...
MinMaxScaler : subtracts the minimum value of the features and then divides by the range, where range is the difference between the original...
Read more >sklearn.preprocessing.MinMaxScaler
Transform features by scaling each feature to a given range. This estimator scales and translates each feature individually such that it is in...
Read more >Warning About Normalizing Data - Think Big Business
After the scaling, our data is distorted. What was an almost flat signal, now looks like a connection with a lot of variation....
Read more >Feature Scaling Techniques in Python – A Complete Guide
Min Max Scaling ... In min-max you will subtract the minimum value in the dataset with all the values and then divide this...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Let’s indeed close this one, it’ll be easier to keep track if we open another issue. Does one of you want to do it? Otherwise I will
If I am not wrong, we should also stop referencing them from the user guide. Maybe it would be interesting to open a new issue to handle this.
WDYT?