Help with lr-finder working with transformers?
See original GitHub issueI am in need of a tool like this for a particular problem that is very sensitive to the LR. I am, however, unable to get this package to work with any transformer model unfortunately.
My error is as below and I am wondering if you have any insight!
from torch_lr_finder import LRFinder
import torch.optim as optim
from transformers import XLMRobertaTokenizer, XLMRobertaForSequenceClassification
model = XLMRobertaForSequenceClassification.from_pretrained("xlm-roberta-base", num_labels=3).cuda()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-7, weight_decay=1e-2)
lr_finder = LRFinder(model, optimizer, criterion, device="cuda")
lr_finder.range_test(train_dataloader, val_loader=valid_dataloader, end_lr=1, num_iter=100, step_mode="linear")
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-8-decc9b6c423b> in <module>
----> 1 lr_finder.range_test(train_dataloader, val_loader=valid_dataloader, end_lr=1, num_iter=100, step_mode="linear")
~\Anaconda3\envs\my_ml\lib\site-packages\torch_lr_finder\lr_finder.py in range_test(self, train_loader, val_loader, start_lr, end_lr, num_iter, step_mode, smooth_f, diverge_th, accumulation_steps, non_blocking_transfer)
284 train_iter,
285 accumulation_steps,
--> 286 non_blocking_transfer=non_blocking_transfer,
287 )
288 if val_loader:
~\Anaconda3\envs\my_ml\lib\site-packages\torch_lr_finder\lr_finder.py in _train_batch(self, train_iter, accumulation_steps, non_blocking_transfer)
342 # Forward pass
343 outputs = self.model(inputs)
--> 344 loss = self.criterion(outputs, labels)
345
346 # Loss should be averaged in each step
~\Anaconda3\envs\my_ml\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
724 result = self._slow_forward(*input, **kwargs)
725 else:
--> 726 result = self.forward(*input, **kwargs)
727 for hook in itertools.chain(
728 _global_forward_hooks.values(),
~\Anaconda3\envs\my_ml\lib\site-packages\torch\nn\modules\loss.py in forward(self, input, target)
946 def forward(self, input: Tensor, target: Tensor) -> Tensor:
947 return F.cross_entropy(input, target, weight=self.weight,
--> 948 ignore_index=self.ignore_index, reduction=self.reduction)
949
950
~\Anaconda3\envs\my_ml\lib\site-packages\torch\nn\functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction)
2420 if size_average is not None or reduce is not None:
2421 reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2422 return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
2423
2424
~\Anaconda3\envs\my_ml\lib\site-packages\torch\nn\functional.py in log_softmax(input, dim, _stacklevel, dtype)
1589 dim = _get_softmax_dim('log_softmax', input.dim(), _stacklevel)
1590 if dtype is None:
-> 1591 ret = input.log_softmax(dim)
1592 else:
1593 ret = input.log_softmax(dim, dtype=dtype)
AttributeError: 'tuple' object has no attribute 'log_softmax'
Issue Analytics
- State:
- Created 3 years ago
- Comments:14 (7 by maintainers)
Top Results From Across the Web
The Learning Rate Finder Technique: How Reliable Is It?
One first striking observation in using the LRFinder is that the loss can be quite different as we change the initial weights of...
Read more >Custom classifier on top of BERT-like Language Model - guide
... of ML-related libraries, including: transformers & tokenizers from HuggingFace, PyTorch Lightning, pandas, scikit-learn and LR Finder.
Read more >How to Decide on Learning Rate. Finding good LR for your ...
If you recall how supervised learning works, you should be familiar with ... Rate Finder class that will help us find a good...
Read more >AutoLRFinder - fastai dev - fast.ai Course Forums
The output of LRfinder is a univariate time series, ... too familiar with fastai library yet, and would need some help to get...
Read more >Effective Training Techniques - PyTorch Lightning
For the moment, this feature only works with models having a single optimizer. LR Finder support for DDP and any of its variations...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @ma-batita
Thanks for bringing up this question. It’s indeed a question for library design. According to my understanding, PyTorch is flexible, and there is no a hard rule made for restricting users how to implement a
model.forward()
. Therefore, there would be various inputs & outputs ofmodel.forward()
except simply tensors. In this situation, it’s hard to change the design ofLRFinder
just to make it meet the specifications of some libraries. (otherwise, it could easily break the compatibility to others)We actually had some questions and PRs purposed regarding model inputs/output handling before, and we found that the approach proposed in PR #37 seems a proper way to go as it could provide a better flexibility for users to get their models work with
LRFinder
without affecting existing codebase (for both users’ andLRFinder
’s). SinceLRFinder
is a tool for finding learning rate, the ideal situation is that you should be able to dispose all code related toLRFinder
after finding a good learning rate, then continue working on your original codebase without changing a single character in it.Therefore, I think the idea of using wrapper classes purposed by David is also a nice solution to deal with these various situations. Though it might increase a little bit difficulty for unconventional models and training pipelines, it could almost make sure that your existing model and pipeline would still work after removing code related to
LRFinder
.So, if you want to deal with the returned object of dataclass from transformers, using the similar approach written in that colab notebook should still work.
The error message
should be raised by tokenizer. As it described, inputs should be a list of text. Maybe you should check what inputs are passed into the model. If you can provide further code snippet to show how you use the model and traceback message of error, maybe I can help you figure out the problem. Otherwise, I guess it’s not a direct problem related to
LRFinder
.Regarding using
AutoTokenizer
andAutoModelForSequenceClassification
, this should not be a problem. Because you can get the same model and tokenizer by either this configuration:or this one:
I have run the same notebook mentioned above using these 2 configurations and they all works.
Hi @NaleRaphael,
I did exactly what you said here. It is running smoothly now but when I change the model it doesn’t. The exemple below :
give me the error : Can’t load config for ‘CenIA/albert-base-spanish’. Make sure that:
‘CenIA/albert-base-spanish’ is a correct model identifier listed on ‘https://huggingface.co/models’
or ‘CenIA/albert-base-spanish’ is the correct path to a directory containing a config.json file
I think it is something related to
transformers==3.0.2
. I couldn’t fix it even when I downloaded the model and all it files from HuggingFace. Can you tell me please how can we fix this ?Thanks for the clarification. Sorry i didn’t work with old version fo transformers that is why! 😅