question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SPRINT] Add warning notes in preprocessing functions

See original GitHub issue

The goal here is to add a warning note in the docstring of the pre-processing functions (follow up to #17387) to warn about potential issues when using these functions, and recommend using a pipeline instead:

  • maxabs_scale
  • minmax_scale
  • ~normalize~
  • quantile_transform
  • robust_scale
  • scale
  • power_transform

All of these are in sklearn/preprocessing/_data.py. Here is a warning template:

    .. warning:: Risk of data leak

        Do not use :func:`~sklearn.preprocessing.scale` unless you know what
        you are doing. A common mistake is to apply it to the entire data
        *before* splitting into training and test sets. This will bias the
        model evaluation because information would have leaked from the test
        set to the training set.
        In general, we recommend using
        :class:`~sklearn.preprocessing.StandardScaler` within a
        :ref:`Pipeline <pipeline>` in order to prevent most risks of data
        leaking: `pipe = make_pipeline(StandardScaler(), LogisticRegression()))`.

You should of course adapt scale and StandardScaler.

Please indicate below which function(s) you want to work on with e.g. “I’m working on scale and robust_scale” so that others don’t pick the same ones

@scikit-learn/core-devs feel free to directly edit the warning message

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:14 (14 by maintainers)

github_iconTop GitHub Comments

1reaction
krumetocommented, Jun 6, 2020

I proposed a template above @krumeto ?

@NicolasHug I thought it is more of a general guideline. OK, we will use it accordingly. Thanks and apologies if I caused confusion!

0reactions
NicolasHugcommented, Jul 11, 2020

looks like all were addressed, closing the issue!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Specific preprocess functions for theme hook suggestions are ...
Add a post-process step to the theme registry build to look for these types of preprocess functions and add them to the theme...
Read more >
Update your email and Sprint notification settings on sprint.com
Enter your email address. Scroll to the Account notifications section. We'll use this email address to send you notes about any changes you...
Read more >
What is header file #include<stdio.h> ? | HackerEarth
These functions make up the bulk of the C standard library header <stdio.h>. The first thing you will notice is the first line...
Read more >
scikit-learn user guide
Warning : To upgrade or uninstall scikit-learn installed with Anaconda or conda you should not ... Notes. The 2013 Paris international sprint.
Read more >
Embedding an image preprocessing function in a tf.keras model
In this tutorial, we are going to see how to embed a simple image preprocessing function within a trained model ( tf.keras )...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found