question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Question regarding reproducibility of train, validation splits

See original GitHub issue

Are the train, validations splits of the dataset guaranteed? From what I see, if I’m not missing something, randomness depends on random_split that uses PyTorch default_generator if a new PyTorch version changes this, is it possible that 2 different versions of the MNISTDataModule for example with 2 torch versions have different splits? Thanks and sorry if this is obvious!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
naterawcommented, Jul 28, 2020

@alejandrodumas The splits you linked to will be encouraged to be done in the datamodule’s setup hook, similar to how you would do something like that in a LightningModule. This way you’d only make one call instead of two as you showed above. Great catch on that btw 😄 .

The LightningDataModule will come packaged directly with lightning in version 0.9.0 and will be removed from bolts. You can find the docs on the latest version of the built-in datamodule class in the lightning docs here.

@ananyahjha93 Let’s close this once we remove calls to random_split from dataloader hooks across this repo (if you think that’s reasonable). That way there is less confusion.

0reactions
stale[bot]commented, Oct 2, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Question regarding reproducibility of train, validation splits #120
Are the train, validations splits of the dataset guaranteed? From what I see, if I'm not missing something, randomness depends on ...
Read more >
Train Test Validation Split: How To & Best Practices [2022]
The train test validation split is a technique for partitioning data into training, validation, and test sets. Learn how to do it, and...
Read more >
Newest 'train-test-split' Questions - Cross Validated
I want to predict on a test set. I have created a binary logistic regression using my current training set and have predicted...
Read more >
Training-validation-test split and cross-validation done right
Out-of-sample evaluation. The solution to this problem is the training-validation-test split. The model is initially fit on a training data set, ...
Read more >
3 Things You Need To Know Before You Train-Test Split
Know the dos and don'ts of train test splitting with scikit learn examples from real life use cases.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found