Reproduction of best trial when loading pre-trained models
See original GitHub issueI believe I encountered the same issue as mentioned in issue number 975#.
I’m running distributed experiments across servers to find the best hyperparameters. When I try to reproduce the best experiment, I copy all the parameters from the config file (I’m using the optuna with allennlp) including seed, and for some reason, I can’t reproduce this exact same results ( I also run the same experiment on the same server). reproduction is fine when I don’t use the optuna distributed trial, so I can replicate experiments when I don’t use the optuna package. Also it doesn’t happen when I use my own model, i.e creating my own pytorch model and experiment it within the optuna distributed trial. This happens only when I load a pre-trained model ( typically from huggingface). Any ideas why this may happen?
This is the related issue which was discussed in: https://github.com/optuna/optuna/issues/975#issuecomment-594482556_
I’ll try to see if the model parameters are different at the beginning of each trial.
The code looks like this
def objective(trial):
lr = trial.suggest_loguniform('lr', 1e-6, 1e-3)
pct_start = trial.suggest_uniform('pct_start', 0.05, 0.5)
b1 = trial.suggest_uniform('b1', 0.7, 0.9)
b2 = trial.suggest_uniform('b2', 0.6, 0.98)
eps = trial.suggest_categorical('eps', [1e-1, 1e-2, 1e-3, 1e-4, 1e-5, 1e-6])
wd = trial.suggest_loguniform('wd', 1e-8, 1e-3)
# destroy old learner
try:
learn.destroy()
except:
print('no learner created')
learn = Learner(data_clas,
custom_transformer_model,
opt_func = lambda input: AdamW(input,correct_bias=False, eps=eps),
loss_func = FlattenedLoss(LabelSmoothingCrossEntropy, axis=-1),
metrics = [accuracy],
wd = wd,
callback_fns=[partial(FastAIPruningCallback, trial=trial, monitor='accuracy')])
# For roberta-base
list_layers = [learn.model.transformer.roberta.embeddings,
learn.model.transformer.roberta.encoder.layer[0],
learn.model.transformer.roberta.encoder.layer[1],
learn.model.transformer.roberta.encoder.layer[2],
learn.model.transformer.roberta.encoder.layer[3],
learn.model.transformer.roberta.encoder.layer[4],
learn.model.transformer.roberta.encoder.layer[5],
learn.model.transformer.roberta.encoder.layer[6],
learn.model.transformer.roberta.encoder.layer[7],
learn.model.transformer.roberta.encoder.layer[8],
learn.model.transformer.roberta.encoder.layer[9],
learn.model.transformer.roberta.encoder.layer[10],
learn.model.transformer.roberta.encoder.layer[11],
learn.model.transformer.roberta.pooler]
learn.split(list_layers)
learn.load('initial')
learn.unfreeze()
learn.to_fp16()
learn.fit_one_cycle(1,
lr,
pct_start = pct_start,
moms = (b1, b2))
return learn.validate()[-1].item() # returns accuracy
Where custom_transformer_model
is loading a pre-trained ROBERTA model. I even made sure to check that the fastai learner was being destroyed at the end of each trial. When I print out the model weights at the before training in each trial, I get something like
However, if I save the initial model weights beforehand and load them at the beginning of each trial with learn.load('initial')
, the weights remain consistent at each trial
However, this seems to only an issue with using a learner with a custom model, as I have used optuna with a tabular_learner
without the need to reset weights between trials, so it could be an issue with the fastai library
Edit: I did some further updates with the fastai learner. It seems that the model weight updates persist even when you destroy the learner and create a new one, so for a custom model, you need to either create a new model or reset weights between trials
_Originally posted by @maxmatical in https://github.com/optuna/optuna/issues/975#issuecomment-594482556_
Issue Analytics
- State:
- Created 3 years ago
- Comments:11 (6 by maintainers)
Top GitHub Comments
I load the model with the allennlp evaluate command, which under the hood eventually uses the model.load_state_dict() pytorch implementation.
@toshihikoyanase the reproduction steps which you mentioned are accurate. If run step 4 twice I get the same results which are different than the original results which were acquired from the optuna trial.
@ofersabo Thank you for your quick response.
May I ask how to re-load the model? I want to know your workflow of the hyperparameter tuning. I guess your workflow consists of the following steps:
AlleNLPExecutor
AllenNLPExecutor
allennlp train
commandIf so, I’d like to know if we can get the same value when we run step 4 twice.
@himkt Do you have any ideas about this issue? For example, I’m curious if
AllenNLPExecutor
can somehow store information as global variables.