Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Chained Schedulers

See original GitHub issue

Currently, if one wants to have a warmup or cooldown period, one has to write a custom scheduler for that. Instead, we could have an abstraction for supporting multiple scheduling periods. This could be done via 1) a ChainedScheduler-like approach, 2) applying multiple schedulers simultaneously, or 3) making scheduling periods a first-class feature of the trainer.

I am leaning against option 3 because it would overly complicate the trainer for the generic use case where one only has one training period. While option 2 would be syntactically simpler, it would introduce challenges with SSR. As such, I am leaning towards option 1.

For option 1:

Each ComposerScheduler should take an apply_ssr parameter (which can be set to False if being used in warmup) and an end_time or period_length parameter (or both, but only allow one to be set). A scheduler should not assume a start_time – that would be determined implicitly whenever the scheduler is first __call__ed.
We would also need to modify the __call__ API of the ComposerScheduler such that a scheduler should return None when it is finished (i.e. its end_time or period_length has been elapsed). None would signal to a ChainedScheduler that the scheduler is done and is no longer managing the learning rate. If the trainer received None, it would interpret that as to not modify the learning rate (which would be equivalent to returning the last returned value). For schedulers (e.g. cooldown) that should run until training end, the end_time parametercould be1dur`, in which case it would never return None.
We can have a ChainedScheduler(schedulers: List[Scheduler] | OrderedDict[str, Scheduler])-like API . (A dict could be used to give each period a name – e.g. “warmup”). Whenever the ChainedScheduler is __call__ed, it would __call__ the 0th scheduler until it returns None; then it would move on to the 1st scheduler until that returns None, etc. After all schedulers would be exhausted, it would return None, which would signal to the trainer to leave the learning rate the same.
The ChainedScheduler would make available the currently active scheduler as an object. Assuming only one ChainedScheduler is being used, algorithms could inspect state.schedulers[0].active_scheduler, state.schedulers[0].active_scheduler_idx, or state.schedulers[0].active_scheduler_name to determine which learning rate period we are in.

Thoughts?

CC @dblalock @jbloxham @mosaicml/composer-team-research

_Originally posted by @ravi-mosaicml in https://github.com/mosaicml/composer/issues/632#issuecomment-1056988994_

Issue Analytics

State:
Created 2 years ago
Comments:8 (8 by maintainers)

Top GitHub Comments

1reaction

jbloxhamcommented, Mar 3, 2022

Generally agree with @hanlint - there’s a lot of complexity going on here that will lead to many edge cases. I generally find it more intuitive, for warmup schedulers, to specify the total duration of the composite scheduler, than to specify it as the sum of two individual parts. It’s also potentially necessary for some schedulers w/ warmup to be aware of the length of the warmup period.

We can make it easier to implement new schedulers by offering helping functions to calculate things like “tau” from the scheduler docs I wrote, but I also think it’s already not really that hard to write a scheduler: http://localhost:8000/api_reference/composer.optim.scheduler.html#composer.optim.scheduler.CosineAnnealingWithWarmupScheduler

0reactions

ravi-mosaicmlcommented, Mar 4, 2022

Created https://github.com/mosaicml/composer/issues/671.

Top Results From Across the Web

ChainedScheduler — PyTorch 1.13 documentation

It takes a list of chainable learning rate schedulers and performs consecutive step() functions belonging to them by just one call. Parameters: schedulers...

Chained schedules - Data Virtuality Platform Documentation

The chained schedule waits for results from all base schedules. The chained schedule will fire when all base schedule job runs end with...

Chained schedules - APA Dictionary of Psychology

a schedule of reinforcement for a single response in which a sequence of at least two schedules, each accompanied by a distinctive stimulus,...

Chained Scheduling - IBM

It requests chained scheduling but the access method automatically uses equivalent techniques. Parent topic: DASD and Tape Performance.

The `SequentialLR` Scheduler cannot use `ChainedScheduler`

Bug. One cannot use a ChainedScheduler as it does not have an optimizer instance variable -- which the SequentialLR relies upon.