Question regarding reproducibility of train, validation splits
See original GitHub issueAre the train, validations splits of the dataset guaranteed? From what I see, if I’m not
missing something, randomness depends on random_split that uses PyTorch default_generator if a new PyTorch version changes this, is it possible that 2 different versions of the MNISTDataModule for example with 2 torch versions have different splits?
Thanks and sorry if this is obvious!
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (4 by maintainers)
Top Results From Across the Web
Question regarding reproducibility of train, validation splits #120
Are the train, validations splits of the dataset guaranteed? From what I see, if I'm not missing something, randomness depends on ...
Read more >Train Test Validation Split: How To & Best Practices [2022]
The train test validation split is a technique for partitioning data into training, validation, and test sets. Learn how to do it, and...
Read more >Newest 'train-test-split' Questions - Cross Validated
I want to predict on a test set. I have created a binary logistic regression using my current training set and have predicted...
Read more >Training-validation-test split and cross-validation done right
Out-of-sample evaluation. The solution to this problem is the training-validation-test split. The model is initially fit on a training data set, ...
Read more >3 Things You Need To Know Before You Train-Test Split
Know the dos and don'ts of train test splitting with scikit learn examples from real life use cases.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

@alejandrodumas The splits you linked to will be encouraged to be done in the datamodule’s
setuphook, similar to how you would do something like that in aLightningModule. This way you’d only make one call instead of two as you showed above. Great catch on that btw 😄 .The
LightningDataModulewill come packaged directly with lightning in version 0.9.0 and will be removed from bolts. You can find the docs on the latest version of the built-in datamodule class in the lightning docs here.@ananyahjha93 Let’s close this once we remove calls to
random_splitfrom dataloader hooks across this repo (if you think that’s reasonable). That way there is less confusion.This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.