Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Extend docs with multiple dataloader with common cases

See original GitHub issue

I notice that one can evaluate the model on a list of validation/test data loaders. Is it also possible to extract data from multiple train_data_loader in the training step in the current version? This feature might be useful in tasks like transfer learning or semi-supervised learning, which usually maintain multiple datasets in the training stage (e.g., source and target datasets in transfer learning, labeled and unlabeled datasets in semi-supervised learning).

It will be nice if one could obtain list of batch data as follow,

def training_step(self, batch_list, batch_nb_list):
    # batch_list = [batch_1, batch_2]
    x_1, y_1 = batch_list[0]
    x_2, y_2 = batch_list[1]
    loss = self.compute_some_loss(x_1, x_2, y_1, y_2)     
    tensorboard_logs = {'train_loss': loss}
    return {'loss': loss, 'log': tensorboard_logs}

def train_dataloader(self):
    return [data_loader_1, data_loader_2]

Issue Analytics

State:
Created 4 years ago
Reactions:7
Comments:20 (16 by maintainers)

Top GitHub Comments

6reactions

soupaultcommented, Mar 25, 2020

if we do support multiple dataloaders, the way to keep it consistent with val and test (which already support that), is to call training_step with alternating batches.

In semi-supervised learning, domain adaptation, consistency training, etc it is typical that one uses the samples from different loaders in the same training step to compute various cross-losses. Thus, alternating behaviour of the training step does not bring much usability improvement. I understand that it is possible to shift the issue to one step back and implement custom Dataset and/or Sampler for such cases, but from my experience having multiple dataloaders is just more explicit and convenient.

6reactions

ylsungcommented, Mar 13, 2020

Thanks for all the replies.

To @Dref360,

I think that the lengths of the data loaders can be different is more flexible, and each data loader can have its batch size. It is my opinion that the loader can just reload the dataset after running out of the data, so it doesn’t depend on other data loaders.
My previous experience is to use the length of the longest data loader (the smallest epoch of all data loaders). But this needs more discussion.

Top Results From Across the Web

Extend docs with multiple dataloader with common cases #1089

I notice that one can evaluate the model on a list of validation/test data loaders. Is it also possible to extract data from...

DataLoaders Explained: Building a Multi-Process Data Loader ...

We now basically have a fully functional data loader; The only issue is that get() is loading in one element of dataset at...

Complete Guide to the DataLoader Class in PyTorch

This post covers the PyTorch dataloader class. We'll show how to load built-in and custom datasets in PyTorch, plus how to transform and...

Writing Custom Datasets, DataLoaders and Transforms

Let's take a single image name and its annotations from the CSV, in this case row index number 65 for person-7.jpg just as...

multiprocess_data_loader - AllenNLP v2.10.1

The MultiProcessDataLoader is a DataLoader that's optimized for AllenNLP ... If you're using Docker, you can increase the shared memory available on a ......