question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Enable setting of training iteration in Trainers

See original GitHub issue

Is your feature request related to a problem? Please describe. Currently, SupervisedTrainer supports controlling the number of iterations by adjusting epoch_length and max_epochs. It would be nice to be able to set the number of iterations to be executed directly.

Describe the solution you’d like Add a n_iterations argument (or similar) that allows overwriting epoch-based definitions of the training steps number of training steps to be executed. Note, this should allow the training to resume from the final iteration if n_iterations is reached. Related to #4554.

Describe alternatives you’ve considered Live with adjusting epoch_length and max_epochs but that seems confusing.

Additional context Add any other context or screenshots about the feature request here.

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
holgerrothcommented, Jun 22, 2022

epoch_length corresponds to number of iterations needed to iterate once of the data (i.e., one epoch). It defaults to len(train_data_loader).

0reactions
vfdev-5commented, Aug 17, 2022

@holgerroth we have on master and in nightly releases max_iters arg for Engine.run(), https://pytorch.org/ignite/master/generated/ignite.engine.engine.Engine.html#ignite.engine.engine.Engine.run It probably works as you asked for:

assert len(data) == 100
s = trainer.run(data, max_iters=123)
assert s.iteration = 123

However, we haven’t yet released that in stable as there can be issues with how this is saved/loaded in checkpoints etc.

A workaround for stable release to that can be

max_iters = 1234
epoch_length = len(data)
max_epochs = max_iters // epoch_length + 1

@trainer.on(Events.ITERATION_COMPLETED(once=max_iters))
def stop():
    trainer.terminate()

trainer.run(data, max_epochs=max_epochs)
Read more comments on GitHub >

github_iconTop Results From Across the Web

Enable training purely based on number of iterations instead ...
Feature Enable training purely based on number of iterations instead of epochs Motivation This can be useful for certain training runs.
Read more >
Trainer — PyTorch Lightning 1.8.5.post0 documentation
Under the hood, the Lightning Trainer handles the training loop details for you, some examples include: Automatically enabling/disabling grads.
Read more >
Training for a set number of iterations without setting epochs?
a super super hacky solution would be setting max_epochs to be an outrageously large value and set max_steps to my desired iteration count....
Read more >
[Guide] Training in Faceswap - Faceswap Forum
Enabling NaN Protection will immediately halt training if a NaN value appears within your loss output. This means that the iteration that ...
Read more >
A full training - Hugging Face Course
Before actually writing our training loop, we will need to define a few objects. The first ones are the dataloaders we will use...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found