Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Enable setting of training iteration in Trainers

See original GitHub issue

Is your feature request related to a problem? Please describe. Currently, SupervisedTrainer supports controlling the number of iterations by adjusting epoch_length and max_epochs. It would be nice to be able to set the number of iterations to be executed directly.

Describe the solution you’d like Add a n_iterations argument (or similar) that allows overwriting epoch-based definitions of the training steps number of training steps to be executed. Note, this should allow the training to resume from the final iteration if n_iterations is reached. Related to #4554.

Describe alternatives you’ve considered Live with adjusting epoch_length and max_epochs but that seems confusing.

Additional context Add any other context or screenshots about the feature request here.

Issue Analytics

State:
Created a year ago
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

holgerrothcommented, Jun 22, 2022

epoch_length corresponds to number of iterations needed to iterate once of the data (i.e., one epoch). It defaults to len(train_data_loader).

0reactions

vfdev-5commented, Aug 17, 2022

@holgerroth we have on master and in nightly releases max_iters arg for Engine.run(), https://pytorch.org/ignite/master/generated/ignite.engine.engine.Engine.html#ignite.engine.engine.Engine.run It probably works as you asked for:

assert len(data) == 100
s = trainer.run(data, max_iters=123)
assert s.iteration = 123

However, we haven’t yet released that in stable as there can be issues with how this is saved/loaded in checkpoints etc.

A workaround for stable release to that can be

max_iters = 1234
epoch_length = len(data)
max_epochs = max_iters // epoch_length + 1

@trainer.on(Events.ITERATION_COMPLETED(once=max_iters))
def stop():
    trainer.terminate()

trainer.run(data, max_epochs=max_epochs)