question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unexpected _run_once_on_dataset behavior if using a custom dataloader that implements __len__ method

See original GitHub issue

🐛 Bug description

Context - I was calling engine.run(my_run_function), with a custom dataloader that implements __len__ method.

What happened - Looks like when my run hits _run_once_on_dataset, the expected behavior is to break out of while loop if dataloader raises StopIteration. Based on another discussions, we expect self.state.epoch_length to be None therefore hit break from there. However during previous steps, self.state.epoch_length would be set to a positive integer since self.state.max_epochs is None for a new run(instead of reading from state_dict) and I have __len__ method defined, even if I didn’t provide epoch_length param to run function.

What I expect - Upon StopIteration, dataloader is exhausted for one epoch, exit _run_once_on_dataset without reinitializing dataloader and re-enter the while loop.

Environment

  • PyTorch Version (e.g., 1.4): 1.10
  • Ignite Version (e.g., 0.3.0): 0.4.9
  • OS (e.g., Linux): MacOS
  • How you installed Ignite (conda, pip, source): pip
  • Python version: 3.9
  • Any other relevant information:

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:7

github_iconTop GitHub Comments

1reaction
vfdev-5commented, Jun 27, 2022

QQ - Is there other impacts to keep in mind when doing so?

Converting explicitly data into an iterator, like iter(train_data), an impact to keep in mind is that if we need to run more than 1 epoch, we have to restart the iterator manually like that:

@trainer.on(Events.DATALOADER_STOP_ITERATION)
def restart_iter():
    trainer.state.dataloader = iter(train_data)

trainer.run(iter(train_data), max_epochs=2)

Source: https://pytorch-ignite.ai/how-to-guides/06-data-iterator/

Another solution (that seems you have already tried) could be to explicitly specify data size with epoch_length argument if we know it and if the value is reliable.

1reaction
AaamberWcommented, Jun 27, 2022

Appreciate the tip! Discarding dataloader size fixed my use case since it effectively used StopIteration. QQ - Is there other impacts to keep in mind when doing so?

FWIW I’d love to make a snippet for us to reproduce the error but my data loading and training are async, and I find it challenging given the complexity. In essence, after loading last partition, not all batches have gone through training. That’s where even when __len__ is correct set to total number of batches, engine will restart dataloader.

Thanks for following along, happy to close the issue!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Data loader unexpected behaviour - PyTorch Forums
Currently working on a custom dataloader and it is showing an unexpected behavior. If the dataloader is called as an enumerator and the ......
Read more >
Understand collate_fn in PyTorch - Python in Plain English
Set our custom function in the loader. As we can see, the batch is in the same format as for default collation with...
Read more >
Taking Datasets, DataLoaders, and PyTorch's New DataPipes ...
Commonly, we use the Dataset class together with the DataLoader class ... __len__ : A method that returns the total number of data...
Read more >
How to define the __len__ method for PyTorch Dataloader ...
I think the most appropriate method to get the length of each split, is to simply use: # Number of training points len(self.train)...
Read more >
An Introduction to Datasets and DataLoader in PyTorch - Wandb
To these ends, it's recommend to use custom Datasets and DatLoaders. ... __len__ : This function returns the length of the dataset.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found