Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to train on part of the dataset

See original GitHub issue

❓ Questions/Help/Support

Hi @vfdev-5 ,

We have some requirements for very big dataset: See we have 100 iterations for 1 epoch, we want to call run() to train on the first 10 iterations data, do some other things for the model and then call run() to train on the second 10 iterations data, … Is it possible in ignite now? I found the the iter for dataloader in ignite is always from beginning? https://github.com/pytorch/ignite/blob/v0.4.2/ignite/engine/engine.py#L771

Thanks.

Issue Analytics

State:
Created 3 years ago
Comments:17 (8 by maintainers)

Top GitHub Comments

1reaction

vfdev-5commented, Oct 28, 2020

Hi @Nic-Ma ,

yes, you are right about that 👍 This can be an interesting approach. Thanks ! Currently, a sort of problem with calling multiple times trainer.run() is that it will trigger all the times events like STARTED, EPOCH_STARTED etc which may not what we expect. This is more or less conceptual problem of what the run is. This is something we are discussing with the team.

There is another approach (workaround) to keep control of the dataflow:


dataloader = ...

def cycle(dataloader):
        while True:
            for i in dataloader:
                yield i

dataloader_iter = cycle(dataloader)

@trainer.on(Events.ITERATION_STARTED)
def prepare_batch(engine):
    engine.state.batch = next(dataloader_iter)

trainer.add_event_handler(Events.ITERATION_COMPLETED(every=10), FLhandler)

data = list(range(len(dataloader) * num_epochs))
# round 1
trainer.run(data, max_epochs=num_epochs)
# FL sync
# ...
# round 2
trainer.run(data)
# FL sync
# ...
# round 3
trainer.run(data)
# etc

1reaction

Nic-Macommented, Oct 19, 2020

Cool, your trick seems very useful, let me make a demo to verify first. Thanks.