Early stopping not working on 0.7.1
See original GitHub issue🐛 Bug
Early stopping does not work anymore. When I downgrade from 0.7.1 or the current dev version to 0.6.0 early stopping works again, with the same code.
Code sample
def main(hparams):
if hparams.early_stopping == 'yes':
early_stopping = EarlyStopping(
monitor='batch/mean_absolute_loss',
min_delta=hparams.min_delta,
patience=hparams.patience,
mode='min'
)
else:
early_stopping = False
model = MemoryTest(hparams)
trainer = pl.Trainer(
val_percent_check=0,
early_stop_callback=early_stopping,
default_save_path=src.settings.LOG_DIR,
max_epochs=hparams.epochs
)
trainer.fit(model)
class MemoryTest(pl.LightningModule):
# Main Testing Unit for Experiments on Recurrent Cells
def __init__(self, hp):
super(MemoryTest, self).__init__()
self.predict_col = hp.predict_col
self.n_datasamples = hp.n_datasamples
self.dataset = hp.dataset
if self.dataset is 'rand':
self.seq_len = None
else:
self.seq_len = hp.seq_len
self.hparams = hp
self.learning_rate = hp.learning_rate
self.training_losses = []
self.final_loss = None
self.model = RecurrentModel(1, hp.n_cells, hp.n_layers, celltype=hp.celltype)
def forward(self, input, input_len):
return self.model(input, input_len)
def training_step(self, batch, batch_idx):
x, y, input_len = batch
features_y = self.forward(x, input_len)
loss = F.mse_loss(features_y, y)
mean_absolute_loss = F.l1_loss(features_y, y)
self.training_losses.append(mean_absolute_loss.item())
neptune_logs = {'batch/train_loss': loss, 'batch/mean_absolute_loss': mean_absolute_loss}
return {'loss': loss, 'batch/mean_absolute_loss': mean_absolute_loss, 'log': neptune_logs}
def on_epoch_end(self):
train_loss_mean = np.mean(self.training_losses)
self.final_loss = train_loss_mean
self.training_losses = [] # reset for next epoch
def configure_optimizers(self):
return torch.optim.SGD(self.parameters(), lr=self.learning_rate)
@pl.data_loader
def train_dataloader(self):
train_dataset = dg.RandomDataset(self.predict_col, self.n_datasamples)
if self.dataset == 'rand_fix':
train_dataset = dg.RandomDatasetFix(self.predict_col, self.n_datasamples, self.seq_len)
if self.dataset == 'correlated':
train_dataset = dg.CorrelatedDataset(self.predict_col, self.n_datasamples)
train_loader = DataLoader(dataset=train_dataset, batch_size=1)
return train_loader
@staticmethod
def add_model_specific_args(parent_parser):
# MODEL specific
model_parser = ArgumentParser(parents=[parent_parser])
model_parser.add_argument('--learning_rate', default=1e-2, type=float)
model_parser.add_argument('--n_layers', default=1, type=int)
model_parser.add_argument('--n_cells', default=5, type=int)
model_parser.add_argument('--celltype', default='LSTM', type=str)
# training specific (for this model)
model_parser.add_argument('--epochs', default=500, type=int)
model_parser.add_argument('--patience', default=5, type=int)
model_parser.add_argument('--min_delta', default=0.1, type=float)
model_parser.add_argument('--early_stopping', default='yes', type=str)
# data specific
model_parser.add_argument('--n_datasamples', default=1000, type=int)
model_parser.add_argument('--seq_len', default=10, type=int)
model_parser.add_argument('--dataset', default='rand', type=str)
model_parser.add_argument('--predict_col', default=1, type=int)
return model_parser
Expected behavior
Early-stopping to take effect again.
Issue Analytics
- State:
- Created 4 years ago
- Comments:10 (6 by maintainers)
Top Results From Across the Web
Early stopping not working on 0.7.1 · Issue #1201 - GitHub
Bug Early stopping does not work anymore. When I downgrade from 0.7.1 or the current dev version to 0.6.0 early stopping works again, ......
Read more >EarlyStopping not stop training - Stack Overflow
As the title suggests I am training my IRV2 network using the following EarlyStopping definition: callback = tf.
Read more >Changelog — AutoNLU documentation
Label tasks have been changed to not use early stopping by default and hyperparameter settings ... DocumentModel.evaluate() would not work for certain Tasks ......
Read more >Databricks Runtime 10.1 for ML (Unsupported) - Microsoft Learn
AutoML now uses early stopping. It stops training and tuning models if the validation metric is no longer improving.
Read more >Migrate early stopping | TensorFlow Core
Early stopping is a regularization technique that stops training if, for example, the validation loss reaches a certain threshold. In TensorFlow 2, there...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I also came to that point when I looked at it 2 days ago, will have more time to look at it soon. If I remember correctly, the tests didnt pass and I was tracking down at which point the change was introduced to figure out the reason it is there.
oh, my bad! Then I will have a closer look at this issue.