question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

feature request: only apply random noise in `RandomAdder` to training data

See original GitHub issue

Currently, RandomAdder adds noise to data both at training and at prediction time. This causes predictions to become non-deterministic and it offers no clear benefit in most cases I can think of.

I suggest changing the default behaviour of the transformer to only add random noise to the train data and optionally through a constructor flag also to the prediction data.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
MBrounscommented, Mar 26, 2019

So I got asked a similar type of thing in todays training where someone wanted to drop rows with too many missing values from train but not from test so I was toying around to see if I could find something that would work.

I might have figured out a way but I’m not sure I like it all that much:

import pandas as pd
import hashlib

class TrainOnlyMixin(BaseEstimator, TransformerMixin):
    
    def fit(self, X, y):
        self.df_hash_ = self.hash_df(X)
        return self
    
    
    @staticmethod
    def hash_df(df):
        return hashlib.sha256(pd.util.hash_pandas_object(df, index=True).values).hexdigest()
    
    def transform(self, X, y=None):
    
        if self.hash_df(X) == self.df_hash_:
            return self.transform_train(X)
        
        else:
            return self.transform_test(X)

I basically store a hash of the train dataframe and compare X with it in transform and then call transform_train or transform_test. I think this can be made quite generic and I can’t think of a case where it wouldn’t work. What do you think?

0reactions
MBrounscommented, Mar 28, 2019
Read more comments on GitHub >

github_iconTop Results From Across the Web

Train Neural Networks With Noise to Reduce Overfitting
In effect, adding noise expands the size of the training dataset. Each time a training sample is exposed to the model, random noise...
Read more >
regression - How is adding noise to training data equivalent to ...
Adding noise to the regressors in the training data is similar to regularization because it leads to similar results to shrinkage.
Read more >
Regularization Method: Noise for improving Deep Learning ...
Adding noise increases the size of our training dataset. When we are training a neural network, random noise is added to each training...
Read more >
Add constant white noise to images #8286 - ultralytics/yolov5
Hi! I am wondering on training on a dataset of BW images and, before making them 3-channels images, I would like to add...
Read more >
Why to Add Noise to Images for Machine Learning
For the purposes of image processing, we often create salt-and-pepper noise – that is, randomly change some pixels to completely white or ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found