question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Help with lr-finder working with transformers?

See original GitHub issue

I am in need of a tool like this for a particular problem that is very sensitive to the LR. I am, however, unable to get this package to work with any transformer model unfortunately.

My error is as below and I am wondering if you have any insight!

from torch_lr_finder import LRFinder
import torch.optim as optim
from transformers import XLMRobertaTokenizer, XLMRobertaForSequenceClassification
model = XLMRobertaForSequenceClassification.from_pretrained("xlm-roberta-base", num_labels=3).cuda()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-7, weight_decay=1e-2)
lr_finder = LRFinder(model, optimizer, criterion, device="cuda")
lr_finder.range_test(train_dataloader, val_loader=valid_dataloader, end_lr=1, num_iter=100, step_mode="linear")

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-8-decc9b6c423b> in <module>
----> 1 lr_finder.range_test(train_dataloader, val_loader=valid_dataloader, end_lr=1, num_iter=100, step_mode="linear")

~\Anaconda3\envs\my_ml\lib\site-packages\torch_lr_finder\lr_finder.py in range_test(self, train_loader, val_loader, start_lr, end_lr, num_iter, step_mode, smooth_f, diverge_th, accumulation_steps, non_blocking_transfer)
    284                 train_iter,
    285                 accumulation_steps,
--> 286                 non_blocking_transfer=non_blocking_transfer,
    287             )
    288             if val_loader:

~\Anaconda3\envs\my_ml\lib\site-packages\torch_lr_finder\lr_finder.py in _train_batch(self, train_iter, accumulation_steps, non_blocking_transfer)
    342             # Forward pass
    343             outputs = self.model(inputs)
--> 344             loss = self.criterion(outputs, labels)
    345 
    346             # Loss should be averaged in each step

~\Anaconda3\envs\my_ml\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
    724             result = self._slow_forward(*input, **kwargs)
    725         else:
--> 726             result = self.forward(*input, **kwargs)
    727         for hook in itertools.chain(
    728                 _global_forward_hooks.values(),

~\Anaconda3\envs\my_ml\lib\site-packages\torch\nn\modules\loss.py in forward(self, input, target)
    946     def forward(self, input: Tensor, target: Tensor) -> Tensor:
    947         return F.cross_entropy(input, target, weight=self.weight,
--> 948                                ignore_index=self.ignore_index, reduction=self.reduction)
    949 
    950 

~\Anaconda3\envs\my_ml\lib\site-packages\torch\nn\functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction)
   2420     if size_average is not None or reduce is not None:
   2421         reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2422     return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
   2423 
   2424 

~\Anaconda3\envs\my_ml\lib\site-packages\torch\nn\functional.py in log_softmax(input, dim, _stacklevel, dtype)
   1589         dim = _get_softmax_dim('log_softmax', input.dim(), _stacklevel)
   1590     if dtype is None:
-> 1591         ret = input.log_softmax(dim)
   1592     else:
   1593         ret = input.log_softmax(dim, dtype=dtype)

AttributeError: 'tuple' object has no attribute 'log_softmax'

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:14 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
NaleRaphaelcommented, Sep 7, 2022

Hi @ma-batita

…why not adapt LRFinder to this major change? one can make the criterion as an option since it is already wrapped into the model (for the classification problems). Is this can be an option or it affects the global function of LRFinder ?

Thanks for bringing up this question. It’s indeed a question for library design. According to my understanding, PyTorch is flexible, and there is no a hard rule made for restricting users how to implement a model.forward(). Therefore, there would be various inputs & outputs of model.forward() except simply tensors. In this situation, it’s hard to change the design of LRFinder just to make it meet the specifications of some libraries. (otherwise, it could easily break the compatibility to others)

We actually had some questions and PRs purposed regarding model inputs/output handling before, and we found that the approach proposed in PR #37 seems a proper way to go as it could provide a better flexibility for users to get their models work with LRFinder without affecting existing codebase (for both users’ and LRFinder’s). Since LRFinder is a tool for finding learning rate, the ideal situation is that you should be able to dispose all code related to LRFinder after finding a good learning rate, then continue working on your original codebase without changing a single character in it.

Therefore, I think the idea of using wrapper classes purposed by David is also a nice solution to deal with these various situations. Though it might increase a little bit difficulty for unconventional models and training pipelines, it could almost make sure that your existing model and pipeline would still work after removing code related to LRFinder.

So, if you want to deal with the returned object of dataclass from transformers, using the similar approach written in that colab notebook should still work.


The error message

batch_text_or_text_pairs has to be a list (got <class ‘tuple’>)

should be raised by tokenizer. As it described, inputs should be a list of text. Maybe you should check what inputs are passed into the model. If you can provide further code snippet to show how you use the model and traceback message of error, maybe I can help you figure out the problem. Otherwise, I guess it’s not a direct problem related to LRFinder.

Regarding using AutoTokenizer and AutoModelForSequenceClassification, this should not be a problem. Because you can get the same model and tokenizer by either this configuration:

xlm_roberta_config = XLMRobertaConfig.get_config_dict('xlm-roberta-base')[0]
xlm_roberta_model = XLMRobertaForSequenceClassification.from_pretrained('xlm-roberta-base', num_labels=2).cuda()
xlm_roberta_tokenizer = XLMRobertaTokenizer.from_pretrained('xlm-roberta-base')

or this one:

config = AutoConfig.from_pretrained("xlm-roberta-base")
xlm_roberta_config = config.get_config_dict('xlm-roberta-base')[0]
xlm_roberta_model = AutoModelForSequenceClassification.from_config(config).cuda()
xlm_roberta_tokenizer = AutoTokenizer.from_pretrained('xlm-roberta-base')

I have run the same notebook mentioned above using these 2 configurations and they all works.

0reactions
ma-batitacommented, Sep 12, 2022

Hi @NaleRaphael,

Regarding the issue you ran into, it’s recommended to run the model on CPU first. Some error messages might not be shown explicitly while running on GPU. Also, it’s also recommended to make sure your code work well without considering anything related LRFinder first. Once you did it, then you can start adding LRFinder into it. This could help you figure out the actual cause.

I did exactly what you said here. It is running smoothly now but when I change the model it doesn’t. The exemple below :

from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig

name = 'CenIA/albert-base-spanish'

config = AutoConfig.from_pretrained(name)
model_config = config.get_config_dict(name)[0]
model = AutoModelForSequenceClassification.from_config(config).cuda()
tokenizer = AutoTokenizer.from_pretrained(name)

give me the error : Can’t load config for ‘CenIA/albert-base-spanish’. Make sure that:

  • ‘CenIA/albert-base-spanish’ is a correct model identifier listed on ‘https://huggingface.co/models

  • or ‘CenIA/albert-base-spanish’ is the correct path to a directory containing a config.json file

I think it is something related to transformers==3.0.2. I couldn’t fix it even when I downloaded the model and all it files from HuggingFace. Can you tell me please how can we fix this ?

As for the last question, you need to set num_labels in config to make AutoModelForSequenceClassification work with difference number of classes,

Thanks for the clarification. Sorry i didn’t work with old version fo transformers that is why! 😅

Read more comments on GitHub >

github_iconTop Results From Across the Web

The Learning Rate Finder Technique: How Reliable Is It?
One first striking observation in using the LRFinder is that the loss can be quite different as we change the initial weights of...
Read more >
Custom classifier on top of BERT-like Language Model - guide
... of ML-related libraries, including: transformers & tokenizers from HuggingFace, PyTorch Lightning, pandas, scikit-learn and LR Finder.
Read more >
How to Decide on Learning Rate. Finding good LR for your ...
If you recall how supervised learning works, you should be familiar with ... Rate Finder class that will help us find a good...
Read more >
AutoLRFinder - fastai dev - fast.ai Course Forums
The output of LRfinder is a univariate time series, ... too familiar with fastai library yet, and would need some help to get...
Read more >
Effective Training Techniques - PyTorch Lightning
For the moment, this feature only works with models having a single optimizer. LR Finder support for DDP and any of its variations...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found