Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Improvements around ReproducibleBatchSampler.

See original GitHub issue

🚀 Feature

IMHO an attempt should be made to not wrap or change the data pipeline objects. Despite the name, this ReproducibleBatchSampler seems to be about making datasets resumable from the middle of an epoch by skipping the first few examples, rather than being reproducible since it doesn’t appear to set any seeds on the samplers.

(If this is not the case, I think the point is even stronger. I’d prefer not to have seeds and “reproducibility” introduced into my data pipeline in the background. They are all good things, but not when it happens without my knowledge, or without an option to disable it).

Silently wrapping objects or creating new instances is can introduce unexpected issues, eg:

point brought up in #812, about side effects.
ReproducibleBatchSampler seems to assume it’s wrapping the pytorch BatchSampler. For instance, it assumes the BatchSampler has a sampler instance variable. This this the case for the pytorch class, but not required in general. BatchSampler is just a sampler where the __iter__ returns a batch of indexes.
ReproducibleBatchSampler samples all the indexes first. This isn’t necessarily a trivial amount of time or memory to hold l those ints if the dataset is large.

First and foremost, I believe this new behavior should be very prominently noted in the documentation, change notes, a warning in the logs, etc. I realized after the fact this is a “note” in the run method engine, but as a user who is upgrading from 0.2.1, I would have had no idea this was happening if it wasn’t for my implementation of batch sampler trivially not being compatible.

Perhaps the behavior should be changed to: if the data loader stack is not holding an instance of ReproducibleBatchSampler, then engine simply loads and does nothing with the batches that need to be skipped upon resume. Users who wish to have the ReproducibleBatchSampler option can explicitly use this class when constructing DataLoader.

Luckily, ignite is a small library and its code is very readable (awesome, very happy user, thanks!). Upon reading the code, I see that I can simply implement my own ReproducibleBatchSampler to bypass the three concerns I pointed out above. For the case where batches are simply sampled iid with replacement out of the data, it’s sufficient to simply shorten the number of elements of the first call to __iter__ when the dataset is resumed.

So this is more of a suggestion about default behavior, which I found to be somewhat unexpected.

Thanks for your work on this library!

Issue Analytics

State:
Created 3 years ago
Reactions:2
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

amatsukawacommented, Apr 18, 2020

@vfdev-5 thanks for the response. Glad to see this is already being addressed. The solution in #895 seems good to me.

0reactions

vfdev-5commented, Apr 18, 2020

@amatsukawa we are very sorry for that ! We’ll update the library soon with more stable v0.4.0 release.

I think what can be temporary done is to convert torch DataLoader into an iterator and specify epoch_length in the run:

trainer.run(map(lambda x: x, data_loader), epoch_length=len(data_loader))

Based on #714 (so, probably this work only on nightly release)

Top Results From Across the Web

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Improvements around ReproducibleBatchSampler.

🚀 Feature

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Unexpected keyword argument 'experiment_name' in Neptune logger

Insert logged times into state