API documentation update for train_test_split()
See original GitHub issueApplied train_test_split
on the imbalanced dataset (Credit Card Fraud dataset) without setting stratify parameter (None
by default). When we checked the test and train data, the class distribution is maintained, i.e., stratification is applied.
Though stratification is applied by default, the document says following which is confusing for users:
stratify : array-like, default=None
If not None, data is split in a stratified fashion, using this as
the class labels.
Read more in the :ref:`User Guide <stratification>`.
API documentation of train_test_split
should be updated to reflect the exact behaviour of stratify
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
DataOperationsCatalog.TrainTestSplit Method (Microsoft.ML)
Split the dataset into the train set and test set according to the given fraction. Respects the samplingKeyColumnName if provided.
Read more >Support stratify in TrainTestSplit() API #4082 - GitHub
In ML.NET in the TrainTestSplit() API we have the samplingKeyColumnName, but that's kind of the opposite to 'Stratification column':. Name of a ...
Read more >sklearn.model_selection.train_test_split
Quick utility that wraps input validation, next(ShuffleSplit().split(X, y)) , and application to input data ... List containing train-test split of inputs.
Read more >Split Your Dataset With scikit-learn's train_test_split()
Using train_test_split() from the data science library scikit-learn, you can split ... then take a look at the official documentation or check out...
Read more >Train-Test Split for Evaluating Machine Learning Algorithms
Last Updated on August 26, 2020. The train-test split procedure is used to estimate the performance of machine learning algorithms when they ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Yes
Yes, if stratify != None then shuffle must be True (and you should probably get an error if you keep it to False)
It’s not a contradiction @brgopalakrishnan , the doc says
shuffle=False => stratify=None
but this doesn’t imply
not(shuffle=False) => not(stratify=None)
A => B
is equivalent tonot(B) => not(A)
but it doesn’t implynot(A) => not(B)
I’ll close the issue since I think I addressed the original issue about imbalance