question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

flip_y in make_classification is misleading

See original GitHub issue

As per description of flip_y, it is “The fraction of samples whose class are randomly exchanged.” So when you have two classes one would expect by setting flip_y equal to 0.1, 10% of the labels flip (exchange), as the name suggest (flip_y). However, if you look at the source code 10% of the labels are assigned random labels which 50% of the time they are assigned their own labels so about 5% of labels are going to be flipped in the end.

This doesn’t seem like a big issue at first, but we have had so many people confused with flip_y in a competition on Kaggle at https://www.kaggle.com/c/instant-gratification.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
akeshavancommented, Nov 2, 2019

The current description is: “The fraction of samples whose class are randomly exchanged. Larger values introduce noise in the labels and make the classification task harder.”

We are suggesting: “The fraction of samples whose class is assigned randomly. Larger values introduce noise in the labels and make the classification task harder.”

Any other suggestions?

0reactions
TomDLTcommented, Nov 2, 2019

Is it adequate to change the description of the variable and keep the variable name flip_y constant, or is changing the variable name to something like random_fraction an option?

I think we would prefer not changing the variable name, and just improve the documentation.

Read more comments on GitHub >

github_iconTop Results From Across the Web

sklearn.datasets.make_classification
If True, the clusters are put on the vertices of a hypercube. If False, the clusters are put on the vertices of a...
Read more >
mars.learn.datasets.make_classification
mars.learn.datasets.make_classification(n_samples=100, n_features=20, ... If False, the clusters are put on the vertices of a random polytope.
Read more >
sklearn.datasets.make_classification fails to generate ...
1 Answer 1 ... Though its not explicitly mentioned and is confusing, the parameter weights require "proportions" of samples. It does not convert ......
Read more >
Scoring Classifier Models using scikit-learn - Ben Alex Keen
Of course this doesn't provide any information about whether the model is has any false positives or false negatives.
Read more >
Why Is Imbalanced Classification Difficult?
We can use the make_classification() scikit-learn function to ... or imbalanced treatment methods, focus on wrong areas of input space.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found