Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Schedulers like get_linear_schedule_with_warmup need access to the length of the train dataset

See original GitHub issue

🐛 Bug

If you’re using a lr scheduler that needs access to the number of batches in the train dataset like @huggingface’s get_linear_schedule_with_warmup, there’s currently no way to access the dataset in configure_optimizers() because it looks like it is called before train_dataloader().

It would be nice to have some way to load the datasets before the optimizers and make the dataset available to other methods with something like self.train_dataset = train_dataset.

Code sample:

train_steps = int(len(train_dataset) / (batch_size * grad_steps) * epochs)
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=
                      int(0.1 * train_steps), num_training_steps=train_steps)

Issue Analytics

State:
Created 4 years ago
Reactions:6
Comments:20 (8 by maintainers)

Top GitHub Comments

4reactions

dd1923commented, Jun 4, 2021

Can this be re-opened? Why not do what @SokolovYaroslav is suggesting. If configure_optimizers() can be called after train_dataloader() then users can simply save the length in a local variable.

4reactions

marrrcincommented, Mar 26, 2020

@SkafteNicki thanks for the response. That’s almost exactly what I did:

    @lru_cache()
    def total_steps(self):
        return len(self.train_dataloader()) // self.hparams.accumulate_grad_batches * self.hparams.epochs

    def configure_optimizers(self):
        optimizer = AdamW(self.model.parameters(), lr=self.hparams.lr)
        lr_scheduler = get_linear_schedule_with_warmup(
                    optimizer,
                    num_warmup_steps=self.hparams.warmup_steps,
                    num_training_steps=self.total_steps(),
        )
        return [optimizer], [{"scheduler": lr_scheduler, "interval": "step"}]

If that’s the “recommended” way of doing it then I’m fine with that 😃

Top Results From Across the Web

Schedulers like get_linear_schedule_with_warmup need ...

Schedulers like get_linear_schedule_with_warmup need access to the length of the train dataset · Issue #1038 · Lightning-AI/lightning · GitHub.

Optimization - Hugging Face

The .optimization module provides: an optimizer with weight decay fixed that can be used to fine-tuned models, and; several schedules in the form...

pytorch get_linear_schedule_with_warmup - You.com | The AI ...

If you're using a lr scheduler that needs access to the number of batches in the train dataset like @huggingface's get_linear_schedule_with_warmup , there's ......

【Train】Deberta-v3-large baseline - Kaggle

The following is necessary if you want to use the fast tokenizer for deberta v2 or ... AutoConfig from transformers import get_linear_schedule_with_warmup, ...

Transformers for Multi-Regression — [PART2] | by Zeineb Ghrib

All the code sources can be retrieved from my Kaggle notebook ... In the same way, we want to get the local evaluation...