No support for stratified split in dask_ml.model_selection.train_test_split
See original GitHub issuescikit-learn implementation of train test split (sklearn.model_selection.train_test_split) supports splitting data according to class labels (stratified split) by using the argument stratify
. This is especially useful when datasets have high class imbalance. It would be really helpful to have this feature in dask_ml as well.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:2
- Comments:20 (13 by maintainers)
Top Results From Across the Web
dask_ml.model_selection.train_test_split
Split arrays into random train and test matrices. ... blockwisebool, default True. Whether to shuffle data only within blocks (True), or allow data...
Read more >Cannot operate on Dask array with unknown chunk sizes - ...
I have a text classification dataset where I used dask parquet to save disk space, but run into the problem now when I...
Read more >Is it possible to have stratified train-test split of a set based ...
Consider a dataframe that contains two columns, text and label . I can very easily create a stratified train-test split using sklearn. model_...
Read more >Model Training - Dask
Dask has a specific module called dask_ml that replicates the features of scikit-learn accelerated with parallelization. We will use that feature and split...
Read more >scikit-learn user guide
In the multilabel case however, splits are still not stratified. ... Fixed a bug where sklearn.model_selection.train_test_split raised an ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hey folks, sorry but I haven’t had the chance to continue working on this. I did open a WIP PR (#635) so if anyone would like to fork off of it or just start from scratch, feel free to do so! Let me know if you’d like anything from me in doing so.
I need the stratify feature in tran_split_test as well for my imbalanced dataset. Any updates?