Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Early stopping not working on 0.7.1

See original GitHub issue

🐛 Bug

Early stopping does not work anymore. When I downgrade from 0.7.1 or the current dev version to 0.6.0 early stopping works again, with the same code.

Code sample

def main(hparams):
    if hparams.early_stopping == 'yes':
        early_stopping = EarlyStopping(
            monitor='batch/mean_absolute_loss',
            min_delta=hparams.min_delta,
            patience=hparams.patience,
            mode='min'
        )
    else:
        early_stopping = False

    model = MemoryTest(hparams)
    trainer = pl.Trainer(
        val_percent_check=0,
        early_stop_callback=early_stopping,
        default_save_path=src.settings.LOG_DIR,
        max_epochs=hparams.epochs
    )

    trainer.fit(model)

class MemoryTest(pl.LightningModule):
    # Main Testing Unit for Experiments on Recurrent Cells
    def __init__(self, hp):
        super(MemoryTest, self).__init__()
        self.predict_col = hp.predict_col
        self.n_datasamples = hp.n_datasamples
        self.dataset = hp.dataset
        if self.dataset is 'rand':
            self.seq_len = None
        else:
            self.seq_len = hp.seq_len
        self.hparams = hp
        self.learning_rate = hp.learning_rate
        self.training_losses = []
        self.final_loss = None

        self.model = RecurrentModel(1, hp.n_cells, hp.n_layers, celltype=hp.celltype)

    def forward(self, input, input_len):
        return self.model(input, input_len)

    def training_step(self, batch, batch_idx):
        x, y, input_len = batch
        features_y = self.forward(x, input_len)

        loss = F.mse_loss(features_y, y)
        mean_absolute_loss = F.l1_loss(features_y, y)

        self.training_losses.append(mean_absolute_loss.item())

        neptune_logs = {'batch/train_loss': loss, 'batch/mean_absolute_loss': mean_absolute_loss}
        return {'loss': loss, 'batch/mean_absolute_loss': mean_absolute_loss, 'log': neptune_logs}

    def on_epoch_end(self):
        train_loss_mean = np.mean(self.training_losses)
        self.final_loss = train_loss_mean
        self.training_losses = []  # reset for next epoch

    def configure_optimizers(self):
        return torch.optim.SGD(self.parameters(), lr=self.learning_rate)

    @pl.data_loader
    def train_dataloader(self):
        train_dataset = dg.RandomDataset(self.predict_col, self.n_datasamples)
        if self.dataset == 'rand_fix':
            train_dataset = dg.RandomDatasetFix(self.predict_col, self.n_datasamples, self.seq_len)
        if self.dataset == 'correlated':
            train_dataset = dg.CorrelatedDataset(self.predict_col, self.n_datasamples)
        train_loader = DataLoader(dataset=train_dataset, batch_size=1)
        return train_loader

    @staticmethod
    def add_model_specific_args(parent_parser):
        # MODEL specific
        model_parser = ArgumentParser(parents=[parent_parser])
        model_parser.add_argument('--learning_rate', default=1e-2, type=float)
        model_parser.add_argument('--n_layers', default=1, type=int)
        model_parser.add_argument('--n_cells', default=5, type=int)
        model_parser.add_argument('--celltype', default='LSTM', type=str)

        # training specific (for this model)
        model_parser.add_argument('--epochs', default=500, type=int)
        model_parser.add_argument('--patience', default=5, type=int)
        model_parser.add_argument('--min_delta', default=0.1, type=float)
        model_parser.add_argument('--early_stopping', default='yes', type=str)

        # data specific
        model_parser.add_argument('--n_datasamples', default=1000, type=int)
        model_parser.add_argument('--seq_len', default=10, type=int)
        model_parser.add_argument('--dataset', default='rand', type=str)
        model_parser.add_argument('--predict_col', default=1, type=int)

        return model_parser

Expected behavior

Early-stopping to take effect again.

Issue Analytics

State:
Created 4 years ago
Comments:10 (6 by maintainers)

Top GitHub Comments

1reaction

awaelchlicommented, Mar 24, 2020

I also came to that point when I looked at it 2 days ago, will have more time to look at it soon. If I remember correctly, the tests didnt pass and I was tracking down at which point the change was introduced to figure out the reason it is there.

1reaction

awaelchlicommented, Mar 21, 2020

oh, my bad! Then I will have a closer look at this issue.

Top Results From Across the Web

Early stopping not working on 0.7.1 · Issue #1201 - GitHub

Bug Early stopping does not work anymore. When I downgrade from 0.7.1 or the current dev version to 0.6.0 early stopping works again, ......

EarlyStopping not stop training - Stack Overflow

As the title suggests I am training my IRV2 network using the following EarlyStopping definition: callback = tf.

Changelog — AutoNLU documentation

Label tasks have been changed to not use early stopping by default and hyperparameter settings ... DocumentModel.evaluate() would not work for certain Tasks ......

Databricks Runtime 10.1 for ML (Unsupported) - Microsoft Learn

AutoML now uses early stopping. It stops training and tuning models if the validation metric is no longer improving.

Migrate early stopping | TensorFlow Core

Early stopping is a regularization technique that stops training if, for example, the validation loss reaches a certain threshold. In TensorFlow 2, there...