question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItĀ collects links to all the places you might be looking at while hunting down a tough bug.

And, if youā€™re still stuck at the end, weā€™re happy to hop on a call to see how we can help out.

Improvements around ReproducibleBatchSampler.

See original GitHub issue

šŸš€ Feature

IMHO an attempt should be made to not wrap or change the data pipeline objects. Despite the name, this ReproducibleBatchSampler seems to be about making datasets resumable from the middle of an epoch by skipping the first few examples, rather than being reproducible since it doesnā€™t appear to set any seeds on the samplers.

(If this is not the case, I think the point is even stronger. Iā€™d prefer not to have seeds and ā€œreproducibilityā€ introduced into my data pipeline in the background. They are all good things, but not when it happens without my knowledge, or without an option to disable it).

Silently wrapping objects or creating new instances is can introduce unexpected issues, eg:

  • point brought up in #812, about side effects.
  • ReproducibleBatchSampler seems to assume itā€™s wrapping the pytorch BatchSampler. For instance, it assumes the BatchSampler has a sampler instance variable. This this the case for the pytorch class, but not required in general. BatchSampler is just a sampler where the __iter__ returns a batch of indexes.
  • ReproducibleBatchSampler samples all the indexes first. This isnā€™t necessarily a trivial amount of time or memory to hold l those ints if the dataset is large.

First and foremost, I believe this new behavior should be very prominently noted in the documentation, change notes, a warning in the logs, etc. I realized after the fact this is a ā€œnoteā€ in the run method engine, but as a user who is upgrading from 0.2.1, I would have had no idea this was happening if it wasnā€™t for my implementation of batch sampler trivially not being compatible.

Perhaps the behavior should be changed to: if the data loader stack is not holding an instance of ReproducibleBatchSampler, then engine simply loads and does nothing with the batches that need to be skipped upon resume. Users who wish to have the ReproducibleBatchSampler option can explicitly use this class when constructing DataLoader.

Luckily, ignite is a small library and its code is very readable (awesome, very happy user, thanks!). Upon reading the code, I see that I can simply implement my own ReproducibleBatchSampler to bypass the three concerns I pointed out above. For the case where batches are simply sampled iid with replacement out of the data, itā€™s sufficient to simply shorten the number of elements of the first call to __iter__ when the dataset is resumed.

So this is more of a suggestion about default behavior, which I found to be somewhat unexpected.

Thanks for your work on this library!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:2
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
amatsukawacommented, Apr 18, 2020

@vfdev-5 thanks for the response. Glad to see this is already being addressed. The solution in #895 seems good to me.

0reactions
vfdev-5commented, Apr 18, 2020

@amatsukawa we are very sorry for that ! Weā€™ll update the library soon with more stable v0.4.0 release.

I think what can be temporary done is to convert torch DataLoader into an iterator and specify epoch_length in the run:

trainer.run(map(lambda x: x, data_loader), epoch_length=len(data_loader))

Based on #714 (so, probably this work only on nightly release)

Read more comments on GitHub >

github_iconTop Results From Across the Web

No results found

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found