ReduceLROnPlateau does not recognise val_loss despite progress_bar dict
See original GitHub issue🐛 Bug
When training my model, I get the following message:
File "C:\Users\Luc\Miniconda3\envs\pytorch\lib\site-packages\pytorch_lightning\trainer\training_loop.py", line 371, in train
raise MisconfigurationException(m)
pytorch_lightning.utilities.debugging.MisconfigurationException: ReduceLROnPlateau conditioned on metric val_loss which is not available. Available metrics are: loss
Ihis is similar to #321for instance, but I definitely return a progress_bar
dict with a val_loss
key in it (see code below).
Code sample
def training_step(self, batch, batch_idx):
z, y_true = batch
y_pred = self.forward(z)
loss_val = self.loss_function(y_pred, y_true)
return {'loss': loss_val.sqrt()}
def validation_step(self, batch, batch_idx):
z, y_true = batch
lr = torch.tensor(self.optim.param_groups[0]['lr'])
y_pred = self.forward(z)
loss_val = self.loss_function(y_pred, y_true)
return {'val_loss': loss_val.sqrt(), 'lr': lr}
def validation_epoch_end(self, outputs):
val_loss_mean = torch.stack([x['val_loss'] for x in outputs]).mean()
lr = outputs[-1]['lr']
logs = {'val_loss': val_loss_mean, 'lr': lr}
return {'val_loss': val_loss_mean, 'progress_bar': logs, 'log': logs}
Expected behavior
The val_loss
value should be picked up by the progress bar.
Environment
- PyTorch Version (e.g., 1.0): 1.4.0
- OS (e.g., Linux): Windows 10
- How you installed PyTorch (
conda
,pip
, source): pip - Python version: 3.6.10
- CUDA/cuDNN version: 10
- GPU models and configuration: 1070Ti x 1
- Any other relevant information:
Issue Analytics
- State:
- Created 4 years ago
- Comments:15 (12 by maintainers)
Top Results From Across the Web
ReduceLROnPlateau does not recognise val_loss despite ...
ReduceLROnPlateau does not recognise val_loss despite progress_bar dict # ... The val_loss value should be picked up by the progress bar.
Read more >ReduceLROnPlateau conditioned on metric - PyTorch Lightning
I run into this error, I don't understand about the available metrics, why are those things? pytorch_lightning.utilities.exceptions.
Read more >How to use keras ReduceLROnPlateau - Stack Overflow
I want the learning rate to be reduced when training is not progressing. I use ReduceLROnPlateau callback. After first 2 epoch with out...
Read more >Changelog — PyTorch Lightning 1.8.5 documentation
Added a warning when the model passed to LightningLite.setup() does not have all ... Fixed main progress bar counter when val_check_interval=int and ...
Read more >RaySGD API Documentation — Ray 0.8.7 documentation
You do not need to handle GPU/devices in this function; RaySGD will do that under the hood. data_creator (dict -> Iterable(s)) – Constructor...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I do not think it is possible just out of the box. However, if you configure your scheduler correctly, then it should be possible. For example, if I initialize my Trainer as
trainer = Trainer(val_check_interval=50)
and initialize my scheduler asit should work (not tested), since
val_loss
will be created every 50 steps but the scheduler will first be called after 100 steps.Okay, after looking at your code @alexeykarnachev, this does not seems to be a bug. When you set
interval': 'step'
you are calling the.step()
method forReduceLROnPlateau
after each batch and it therefore makes complete sense that noval_loss
is calculated yet. If you really want to do something like this, you need to setval_check_interval
in theTrainer
construction to a number lower thanfrequency
in the scheduler construction. In this wayval_loss
will be calculated before.step()
is called.