Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Reproduction of best trial when loading pre-trained models

See original GitHub issue

I believe I encountered the same issue as mentioned in issue number 975#.

I’m running distributed experiments across servers to find the best hyperparameters. When I try to reproduce the best experiment, I copy all the parameters from the config file (I’m using the optuna with allennlp) including seed, and for some reason, I can’t reproduce this exact same results ( I also run the same experiment on the same server). reproduction is fine when I don’t use the optuna distributed trial, so I can replicate experiments when I don’t use the optuna package. Also it doesn’t happen when I use my own model, i.e creating my own pytorch model and experiment it within the optuna distributed trial. This happens only when I load a pre-trained model ( typically from huggingface). Any ideas why this may happen?

This is the related issue which was discussed in: https://github.com/optuna/optuna/issues/975#issuecomment-594482556_

I’ll try to see if the model parameters are different at the beginning of each trial.

The code looks like this

def objective(trial):
    lr = trial.suggest_loguniform('lr', 1e-6, 1e-3)
    pct_start = trial.suggest_uniform('pct_start', 0.05, 0.5)
    b1 = trial.suggest_uniform('b1', 0.7, 0.9)
    b2 = trial.suggest_uniform('b2', 0.6, 0.98)
    eps = trial.suggest_categorical('eps', [1e-1, 1e-2, 1e-3, 1e-4, 1e-5, 1e-6])
    wd = trial.suggest_loguniform('wd', 1e-8, 1e-3)

    # destroy old learner
    try: 
        learn.destroy()
    except:
        print('no learner created')

    learn = Learner(data_clas,
        custom_transformer_model,
        opt_func = lambda input: AdamW(input,correct_bias=False, eps=eps),
        loss_func = FlattenedLoss(LabelSmoothingCrossEntropy, axis=-1),
        metrics = [accuracy],
        wd = wd,
        callback_fns=[partial(FastAIPruningCallback, trial=trial, monitor='accuracy')])
    # For roberta-base
    list_layers = [learn.model.transformer.roberta.embeddings,
                learn.model.transformer.roberta.encoder.layer[0],
                learn.model.transformer.roberta.encoder.layer[1],
                learn.model.transformer.roberta.encoder.layer[2],
                learn.model.transformer.roberta.encoder.layer[3],
                learn.model.transformer.roberta.encoder.layer[4],
                learn.model.transformer.roberta.encoder.layer[5],
                learn.model.transformer.roberta.encoder.layer[6],
                learn.model.transformer.roberta.encoder.layer[7],
                learn.model.transformer.roberta.encoder.layer[8],
                learn.model.transformer.roberta.encoder.layer[9],
                learn.model.transformer.roberta.encoder.layer[10],
                learn.model.transformer.roberta.encoder.layer[11],
                learn.model.transformer.roberta.pooler]

    learn.split(list_layers)
    learn.load('initial')
    learn.unfreeze() 
    learn.to_fp16()
    learn.fit_one_cycle(1,
                lr,
                pct_start = pct_start,
                moms = (b1, b2))

    return learn.validate()[-1].item() # returns accuracy

Where custom_transformer_model is loading a pre-trained ROBERTA model. I even made sure to check that the fastai learner was being destroyed at the end of each trial. When I print out the model weights at the before training in each trial, I get something like

However, if I save the initial model weights beforehand and load them at the beginning of each trial with learn.load('initial'), the weights remain consistent at each trial

However, this seems to only an issue with using a learner with a custom model, as I have used optuna with a tabular_learner without the need to reset weights between trials, so it could be an issue with the fastai library

Edit: I did some further updates with the fastai learner. It seems that the model weight updates persist even when you destroy the learner and create a new one, so for a custom model, you need to either create a new model or reset weights between trials

_Originally posted by @maxmatical in https://github.com/optuna/optuna/issues/975#issuecomment-594482556_

Issue Analytics

State:
Created 3 years ago
Comments:11 (6 by maintainers)

Top GitHub Comments

1reaction

ofersabocommented, Jul 7, 2020

I load the model with the allennlp evaluate command, which under the hood eventually uses the model.load_state_dict() pytorch implementation.

@toshihikoyanase the reproduction steps which you mentioned are accurate. If run step 4 twice I get the same results which are different than the original results which were acquired from the optuna trial.

1reaction

toshihikoyanasecommented, Jul 7, 2020

@ofersabo Thank you for your quick response.

Also, when I re-load the model and execute evaluation on any dataset I see differences with the results, these differences are larger than a point so it’s isn’t an issue of floating-point.

May I ask how to re-load the model? I want to know your workflow of the hyperparameter tuning. I guess your workflow consists of the following steps:

Create a Jsonnet file for AlleNLPExecutor
Run AllenNLPExecutor
Fill the best params to the Jsonnet file
Reproduce the best trial using the allennlp train command

If so, I’d like to know if we can get the same value when we run step 4 twice.

Yes, I’m using the allennlp executed. to be precise here is the code I’m using: optuna.integration.allennlp.AllenNLPExecutor

@himkt Do you have any ideas about this issue? For example, I’m curious if AllenNLPExecutor can somehow store information as global variables.

Top Results From Across the Web

Loading and inference with a pre-trained model - O'Reilly

The trained model, 'QDrawModel.h5' , which has been run for 25 epochs and achieved a test accuracy of just over 90%, has been...

Pre-trained models: Past, present and future - ScienceDirect

Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved great success and become a milestone in the field of artificial ......

Pre-trained XGBoost model does not reproduce results if ...

I save the model into file for further use. Within the same session, the model loaded from file reproduces results well on validation...

How do pre-trained models work? - Towards Data Science

We start by loading a pretrained model. Initially, we only train the added layers. We do so because the weights of these layers...

Hyperparameter Optimization for Transformers: A guide

Overview of fine-tuning a pre-trained model. Two new fully connected layers are appended to the pre-trained Transformer network.