question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Chained Schedulers

See original GitHub issue

Currently, if one wants to have a warmup or cooldown period, one has to write a custom scheduler for that. Instead, we could have an abstraction for supporting multiple scheduling periods. This could be done via 1) a ChainedScheduler-like approach, 2) applying multiple schedulers simultaneously, or 3) making scheduling periods a first-class feature of the trainer.

I am leaning against option 3 because it would overly complicate the trainer for the generic use case where one only has one training period. While option 2 would be syntactically simpler, it would introduce challenges with SSR. As such, I am leaning towards option 1.

For option 1:

  • Each ComposerScheduler should take an apply_ssr parameter (which can be set to False if being used in warmup) and an end_time or period_length parameter (or both, but only allow one to be set). A scheduler should not assume a start_time – that would be determined implicitly whenever the scheduler is first __call__ed.
  • We would also need to modify the __call__ API of the ComposerScheduler such that a scheduler should return None when it is finished (i.e. its end_time or period_length has been elapsed). None would signal to a ChainedScheduler that the scheduler is done and is no longer managing the learning rate. If the trainer received None, it would interpret that as to not modify the learning rate (which would be equivalent to returning the last returned value). For schedulers (e.g. cooldown) that should run until training end, the end_time parametercould be1dur`, in which case it would never return None.
  • We can have a ChainedScheduler(schedulers: List[Scheduler] | OrderedDict[str, Scheduler])-like API . (A dict could be used to give each period a name – e.g. “warmup”). Whenever the ChainedScheduler is __call__ed, it would __call__ the 0th scheduler until it returns None; then it would move on to the 1st scheduler until that returns None, etc. After all schedulers would be exhausted, it would return None, which would signal to the trainer to leave the learning rate the same.
  • The ChainedScheduler would make available the currently active scheduler as an object. Assuming only one ChainedScheduler is being used, algorithms could inspect state.schedulers[0].active_scheduler, state.schedulers[0].active_scheduler_idx, or state.schedulers[0].active_scheduler_name to determine which learning rate period we are in.

Thoughts?

CC @dblalock @jbloxham @mosaicml/composer-team-research

_Originally posted by @ravi-mosaicml in https://github.com/mosaicml/composer/issues/632#issuecomment-1056988994_

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
jbloxhamcommented, Mar 3, 2022

Generally agree with @hanlint - there’s a lot of complexity going on here that will lead to many edge cases. I generally find it more intuitive, for warmup schedulers, to specify the total duration of the composite scheduler, than to specify it as the sum of two individual parts. It’s also potentially necessary for some schedulers w/ warmup to be aware of the length of the warmup period.

We can make it easier to implement new schedulers by offering helping functions to calculate things like “tau” from the scheduler docs I wrote, but I also think it’s already not really that hard to write a scheduler: http://localhost:8000/api_reference/composer.optim.scheduler.html#composer.optim.scheduler.CosineAnnealingWithWarmupScheduler

0reactions
ravi-mosaicmlcommented, Mar 4, 2022
Read more comments on GitHub >

github_iconTop Results From Across the Web

ChainedScheduler — PyTorch 1.13 documentation
It takes a list of chainable learning rate schedulers and performs consecutive step() functions belonging to them by just one call. Parameters: schedulers...
Read more >
Chained schedules - Data Virtuality Platform Documentation
The chained schedule waits for results from all base schedules. The chained schedule will fire when all base schedule job runs end with...
Read more >
Chained schedules - APA Dictionary of Psychology
a schedule of reinforcement for a single response in which a sequence of at least two schedules, each accompanied by a distinctive stimulus,...
Read more >
Chained Scheduling - IBM
It requests chained scheduling but the access method automatically uses equivalent techniques. Parent topic: DASD and Tape Performance.
Read more >
The `SequentialLR` Scheduler cannot use `ChainedScheduler`
Bug. One cannot use a ChainedScheduler as it does not have an optimizer instance variable -- which the SequentialLR relies upon.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found