Extend docs with multiple dataloader with common cases
See original GitHub issueI notice that one can evaluate the model on a list of validation/test data loaders. Is it also possible to extract data from multiple train_data_loader
in the training step
in the current version? This feature might be useful in tasks like transfer learning or semi-supervised learning, which usually maintain multiple datasets in the training stage (e.g., source and target datasets in transfer learning, labeled and unlabeled datasets in semi-supervised learning).
It will be nice if one could obtain list of batch data as follow,
def training_step(self, batch_list, batch_nb_list):
# batch_list = [batch_1, batch_2]
x_1, y_1 = batch_list[0]
x_2, y_2 = batch_list[1]
loss = self.compute_some_loss(x_1, x_2, y_1, y_2)
tensorboard_logs = {'train_loss': loss}
return {'loss': loss, 'log': tensorboard_logs}
def train_dataloader(self):
return [data_loader_1, data_loader_2]
Issue Analytics
- State:
- Created 4 years ago
- Reactions:7
- Comments:20 (16 by maintainers)
Top Results From Across the Web
Extend docs with multiple dataloader with common cases #1089
I notice that one can evaluate the model on a list of validation/test data loaders. Is it also possible to extract data from...
Read more >DataLoaders Explained: Building a Multi-Process Data Loader ...
We now basically have a fully functional data loader; The only issue is that get() is loading in one element of dataset at...
Read more >Complete Guide to the DataLoader Class in PyTorch
This post covers the PyTorch dataloader class. We'll show how to load built-in and custom datasets in PyTorch, plus how to transform and...
Read more >Writing Custom Datasets, DataLoaders and Transforms
Let's take a single image name and its annotations from the CSV, in this case row index number 65 for person-7.jpg just as...
Read more >multiprocess_data_loader - AllenNLP v2.10.1
The MultiProcessDataLoader is a DataLoader that's optimized for AllenNLP ... If you're using Docker, you can increase the shared memory available on a ......
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
In semi-supervised learning, domain adaptation, consistency training, etc it is typical that one uses the samples from different loaders in the same training step to compute various cross-losses. Thus, alternating behaviour of the training step does not bring much usability improvement. I understand that it is possible to shift the issue to one step back and implement custom Dataset and/or Sampler for such cases, but from my experience having multiple dataloaders is just more explicit and convenient.
Thanks for all the replies.
To @Dref360,
I think that the lengths of the data loaders can be different is more flexible, and each data loader can have its batch size. It is my opinion that the loader can just reload the dataset after running out of the data, so it doesn’t depend on other data loaders.
My previous experience is to use the length of the longest data loader (the smallest epoch of all data loaders). But this needs more discussion.