[SPRINT] Add warning notes in preprocessing functions
See original GitHub issueThe goal here is to add a warning note in the docstring of the pre-processing functions (follow up to #17387) to warn about potential issues when using these functions, and recommend using a pipeline instead:
- maxabs_scale
- minmax_scale
- ~normalize~
- quantile_transform
- robust_scale
- scale
- power_transform
All of these are in sklearn/preprocessing/_data.py
. Here is a warning template:
.. warning:: Risk of data leak
Do not use :func:`~sklearn.preprocessing.scale` unless you know what
you are doing. A common mistake is to apply it to the entire data
*before* splitting into training and test sets. This will bias the
model evaluation because information would have leaked from the test
set to the training set.
In general, we recommend using
:class:`~sklearn.preprocessing.StandardScaler` within a
:ref:`Pipeline <pipeline>` in order to prevent most risks of data
leaking: `pipe = make_pipeline(StandardScaler(), LogisticRegression()))`.
You should of course adapt scale
and StandardScaler
.
Please indicate below which function(s) you want to work on with e.g. “I’m working on scale
and robust_scale
” so that others don’t pick the same ones
@scikit-learn/core-devs feel free to directly edit the warning message
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:14 (14 by maintainers)
Top Results From Across the Web
Specific preprocess functions for theme hook suggestions are ...
Add a post-process step to the theme registry build to look for these types of preprocess functions and add them to the theme...
Read more >Update your email and Sprint notification settings on sprint.com
Enter your email address. Scroll to the Account notifications section. We'll use this email address to send you notes about any changes you...
Read more >What is header file #include<stdio.h> ? | HackerEarth
These functions make up the bulk of the C standard library header <stdio.h>. The first thing you will notice is the first line...
Read more >scikit-learn user guide
Warning : To upgrade or uninstall scikit-learn installed with Anaconda or conda you should not ... Notes. The 2013 Paris international sprint.
Read more >Embedding an image preprocessing function in a tf.keras model
In this tutorial, we are going to see how to embed a simple image preprocessing function within a trained model ( tf.keras )...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@NicolasHug I thought it is more of a general guideline. OK, we will use it accordingly. Thanks and apologies if I caused confusion!
looks like all were addressed, closing the issue!