question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Extend docs with multiple dataloader with common cases

See original GitHub issue

I notice that one can evaluate the model on a list of validation/test data loaders. Is it also possible to extract data from multiple train_data_loader in the training step in the current version? This feature might be useful in tasks like transfer learning or semi-supervised learning, which usually maintain multiple datasets in the training stage (e.g., source and target datasets in transfer learning, labeled and unlabeled datasets in semi-supervised learning).

It will be nice if one could obtain list of batch data as follow,

def training_step(self, batch_list, batch_nb_list):
    # batch_list = [batch_1, batch_2]
    x_1, y_1 = batch_list[0]
    x_2, y_2 = batch_list[1]
    loss = self.compute_some_loss(x_1, x_2, y_1, y_2)     
    tensorboard_logs = {'train_loss': loss}
    return {'loss': loss, 'log': tensorboard_logs}

def train_dataloader(self):
    return [data_loader_1, data_loader_2]

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:7
  • Comments:20 (16 by maintainers)

github_iconTop GitHub Comments

6reactions
soupaultcommented, Mar 25, 2020

if we do support multiple dataloaders, the way to keep it consistent with val and test (which already support that), is to call training_step with alternating batches.

In semi-supervised learning, domain adaptation, consistency training, etc it is typical that one uses the samples from different loaders in the same training step to compute various cross-losses. Thus, alternating behaviour of the training step does not bring much usability improvement. I understand that it is possible to shift the issue to one step back and implement custom Dataset and/or Sampler for such cases, but from my experience having multiple dataloaders is just more explicit and convenient.

6reactions
ylsungcommented, Mar 13, 2020

Thanks for all the replies.

To @Dref360,

  1. I think that the lengths of the data loaders can be different is more flexible, and each data loader can have its batch size. It is my opinion that the loader can just reload the dataset after running out of the data, so it doesn’t depend on other data loaders.

  2. My previous experience is to use the length of the longest data loader (the smallest epoch of all data loaders). But this needs more discussion.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Extend docs with multiple dataloader with common cases #1089
I notice that one can evaluate the model on a list of validation/test data loaders. Is it also possible to extract data from...
Read more >
DataLoaders Explained: Building a Multi-Process Data Loader ...
We now basically have a fully functional data loader; The only issue is that get() is loading in one element of dataset at...
Read more >
Complete Guide to the DataLoader Class in PyTorch
This post covers the PyTorch dataloader class. We'll show how to load built-in and custom datasets in PyTorch, plus how to transform and...
Read more >
Writing Custom Datasets, DataLoaders and Transforms
Let's take a single image name and its annotations from the CSV, in this case row index number 65 for person-7.jpg just as...
Read more >
multiprocess_data_loader - AllenNLP v2.10.1
The MultiProcessDataLoader is a DataLoader that's optimized for AllenNLP ... If you're using Docker, you can increase the shared memory available on a ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found