val_dataloader is called twice in each worker
See original GitHub issue🐛 Bug
I’m trying a LightningDataModule
class to manage the data.
Using horovod backend, if that matters.
I’ve noticed that each rank is calling train_dataloader
once, but val_dataloader
two times somehow.
To Reproduce
run LIghtning with Dataclass and horovod, add some debug print on when val_dataloader
is called
soemthing like
def train_dataloader(self):
print(f"\n#####worker {hvd.rank()} of {hvd.size()} creating train_loader\n")
return load_ds_from_dir(os.path.join(self.path, "train"), self.batch_size)
def val_dataloader(self):
print(f"\n#####worker {hvd.rank()} of {hvd.size()} creating val\n")
return load_ds_from_dir(os.path.join(self.path, "validation"), self.batch_size)
Expected behavior
expect val loader to be called only once…
Environment
* CUDA:
- GPU:
- Tesla V100-SXM2-16GB
- Tesla V100-SXM2-16GB
- Tesla V100-SXM2-16GB
- Tesla V100-SXM2-16GB
- available: True
- version: 10.2
* Packages:
- numpy: 1.19.1
- pyTorch_debug: False
- pyTorch_version: 1.6.0
- pytorch-lightning: 0.9.0
- tensorboard: 2.2.0
- tqdm: 4.46.1
* System:
- OS: Linux
- architecture:
- 64bit
- ELF
- processor: x86_64
- python: 3.8.2
- version: #1 SMP Fri Apr 20 16:44:24 UTC 2018
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (1 by maintainers)
Top Results From Across the Web
val_dataloader is called twice in each worker #3377 - GitHub
I'm trying a LightningDataModule class to manage the data. Using horovod backend, if that matters. I've noticed that each rank is calling ......
Read more >How frequently are train_dataloader and val_dataloader called?
The trainloader is created for training purposes only (because this is usually way more data to load then for validation). EDIT: Unfortunately ...
Read more >__getitem__ is called multiple times - PyTorch Forums
I'm using a custom Dataset where the path to the images is created from a Pandas ... If I use multiple workers, it's...
Read more >Iterable pytorch dataset with multiple workers - Stack Overflow
The __iter__ function is called by the data loader, for each worker when the data loader is first looped over. Have you tried...
Read more >DataLoaders Explained: Building a Multi-Process Data Loader ...
This data loader will spawn num_workers workers upon its initialization: class DataLoader(NaiveDataLoader): def __init__( self, dataset, batch_ ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Thanks for considering to update the docs!
This issue has been automatically marked as stale because it hasn’t had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!