question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Early stopping not working on 0.7.1

See original GitHub issue

🐛 Bug

Early stopping does not work anymore. When I downgrade from 0.7.1 or the current dev version to 0.6.0 early stopping works again, with the same code.

Code sample

def main(hparams):
    if hparams.early_stopping == 'yes':
        early_stopping = EarlyStopping(
            monitor='batch/mean_absolute_loss',
            min_delta=hparams.min_delta,
            patience=hparams.patience,
            mode='min'
        )
    else:
        early_stopping = False

    model = MemoryTest(hparams)
    trainer = pl.Trainer(
        val_percent_check=0,
        early_stop_callback=early_stopping,
        default_save_path=src.settings.LOG_DIR,
        max_epochs=hparams.epochs
    )

    trainer.fit(model)
class MemoryTest(pl.LightningModule):
    # Main Testing Unit for Experiments on Recurrent Cells
    def __init__(self, hp):
        super(MemoryTest, self).__init__()
        self.predict_col = hp.predict_col
        self.n_datasamples = hp.n_datasamples
        self.dataset = hp.dataset
        if self.dataset is 'rand':
            self.seq_len = None
        else:
            self.seq_len = hp.seq_len
        self.hparams = hp
        self.learning_rate = hp.learning_rate
        self.training_losses = []
        self.final_loss = None

        self.model = RecurrentModel(1, hp.n_cells, hp.n_layers, celltype=hp.celltype)

    def forward(self, input, input_len):
        return self.model(input, input_len)

    def training_step(self, batch, batch_idx):
        x, y, input_len = batch
        features_y = self.forward(x, input_len)

        loss = F.mse_loss(features_y, y)
        mean_absolute_loss = F.l1_loss(features_y, y)

        self.training_losses.append(mean_absolute_loss.item())

        neptune_logs = {'batch/train_loss': loss, 'batch/mean_absolute_loss': mean_absolute_loss}
        return {'loss': loss, 'batch/mean_absolute_loss': mean_absolute_loss, 'log': neptune_logs}

    def on_epoch_end(self):
        train_loss_mean = np.mean(self.training_losses)
        self.final_loss = train_loss_mean
        self.training_losses = []  # reset for next epoch

    def configure_optimizers(self):
        return torch.optim.SGD(self.parameters(), lr=self.learning_rate)

    @pl.data_loader
    def train_dataloader(self):
        train_dataset = dg.RandomDataset(self.predict_col, self.n_datasamples)
        if self.dataset == 'rand_fix':
            train_dataset = dg.RandomDatasetFix(self.predict_col, self.n_datasamples, self.seq_len)
        if self.dataset == 'correlated':
            train_dataset = dg.CorrelatedDataset(self.predict_col, self.n_datasamples)
        train_loader = DataLoader(dataset=train_dataset, batch_size=1)
        return train_loader

    @staticmethod
    def add_model_specific_args(parent_parser):
        # MODEL specific
        model_parser = ArgumentParser(parents=[parent_parser])
        model_parser.add_argument('--learning_rate', default=1e-2, type=float)
        model_parser.add_argument('--n_layers', default=1, type=int)
        model_parser.add_argument('--n_cells', default=5, type=int)
        model_parser.add_argument('--celltype', default='LSTM', type=str)

        # training specific (for this model)
        model_parser.add_argument('--epochs', default=500, type=int)
        model_parser.add_argument('--patience', default=5, type=int)
        model_parser.add_argument('--min_delta', default=0.1, type=float)
        model_parser.add_argument('--early_stopping', default='yes', type=str)

        # data specific
        model_parser.add_argument('--n_datasamples', default=1000, type=int)
        model_parser.add_argument('--seq_len', default=10, type=int)
        model_parser.add_argument('--dataset', default='rand', type=str)
        model_parser.add_argument('--predict_col', default=1, type=int)

        return model_parser

Expected behavior

Early-stopping to take effect again.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:10 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
awaelchlicommented, Mar 24, 2020

I also came to that point when I looked at it 2 days ago, will have more time to look at it soon. If I remember correctly, the tests didnt pass and I was tracking down at which point the change was introduced to figure out the reason it is there.

1reaction
awaelchlicommented, Mar 21, 2020

oh, my bad! Then I will have a closer look at this issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Early stopping not working on 0.7.1 · Issue #1201 - GitHub
Bug Early stopping does not work anymore. When I downgrade from 0.7.1 or the current dev version to 0.6.0 early stopping works again, ......
Read more >
EarlyStopping not stop training - Stack Overflow
As the title suggests I am training my IRV2 network using the following EarlyStopping definition: callback = tf.
Read more >
Changelog — AutoNLU documentation
Label tasks have been changed to not use early stopping by default and hyperparameter settings ... DocumentModel.evaluate() would not work for certain Tasks ......
Read more >
Databricks Runtime 10.1 for ML (Unsupported) - Microsoft Learn
AutoML now uses early stopping. It stops training and tuning models if the validation metric is no longer improving.
Read more >
Migrate early stopping | TensorFlow Core
Early stopping is a regularization technique that stops training if, for example, the validation loss reaches a certain threshold. In TensorFlow 2, there...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found