Passing optimizer to Trainer constructor does not work
See original GitHub issueSystem Info
transformers
version: 4.20.1- Platform: Linux-5.4.188±x86_64-with-Ubuntu-18.04-bionic
- Python version: 3.7.13
- Huggingface_hub version: 0.8.1
- PyTorch version (GPU?): 1.12.0+cu102 (False)
- Tensorflow version (GPU?): 2.8.2 (False)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: TPU (used on the platform, nothing specific in the script)
- Using distributed or parallel set-up in script?: No
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
Run the following script, which passes an optimizer object to Trainer
.
import numpy as np
import site
import torch
from datasets import load_dataset, load_metric
from transformers import AutoModelForSequenceClassification, AutoTokenizer, Trainer, TrainingArguments, set_seed
from transformers.trainer_pt_utils import get_parameter_names
PASS_OPTIMIZER_TO_TRAINER = True
MODEL_NAME = 'albert-large-v2'
TASK = 'rte'
MAX_SEQ_LENGTH = 128
EPOCHS = 1
LEARNING_RATE = 2e-5
SEED = 10000
OPTIMIZER = 'adamw_torch'
OUTPUT_DIR = 'output'
train_args = TrainingArguments(num_train_epochs=EPOCHS,
learning_rate=LEARNING_RATE,
seed=SEED,
optim=OPTIMIZER,
output_dir=OUTPUT_DIR,
overwrite_output_dir=True,
evaluation_strategy='epoch',
do_eval=True,
full_determinism=True)
set_seed(SEED)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
raw_datasets = load_dataset("glue", TASK)
metric = load_metric("glue", TASK)
def compute_metrics(p):
preds = p.predictions
preds = np.argmax(p.predictions, axis=1)
return metric.compute(predictions=preds, references=p.label_ids)
def preprocess_function(examples):
# Tokenize the texts
args = (
(examples['sentence1'], examples['sentence2'])
)
return tokenizer(*args, padding="max_length", max_length=MAX_SEQ_LENGTH, truncation=True)
raw_datasets = raw_datasets.map(
preprocess_function,
batched=True)
train_dataset = raw_datasets["train"]
eval_dataset = raw_datasets["validation"]
if not PASS_OPTIMIZER_TO_TRAINER:
trainer = Trainer(
model=model,
args=train_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
compute_metrics=compute_metrics,
tokenizer=tokenizer)
# Create adamw_torch optimizer manually
decay_parameters = get_parameter_names(model, [torch.nn.LayerNorm])
decay_parameters = [name for name in decay_parameters if "bias" not in name]
optimizer_grouped_parameters = [
{
"params": [p for n, p in model.named_parameters() if n in decay_parameters],
"weight_decay": train_args.weight_decay,
},
{
"params": [p for n, p in model.named_parameters() if n not in decay_parameters],
"weight_decay": 0.0,
},
]
optimizer = torch.optim.AdamW(optimizer_grouped_parameters,
lr=train_args.learning_rate,
betas=(train_args.adam_beta1, train_args.adam_beta2),
eps=train_args.adam_epsilon)
if PASS_OPTIMIZER_TO_TRAINER:
trainer = Trainer(
model=model,
args=train_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
compute_metrics=compute_metrics,
tokenizer=tokenizer,
optimizers=(optimizer, None))
else:
#trainer.optimizer = optimizer
pass
trainer.train()
The model fails to train. Also the training pass runs at about 2x the normal speed.
If the variable PASS_OPTIMIZER_TO_TRAINER
is now set to False
, the Trainer
creates its optimizer based on train_args
, which should be identical to the manually created one. However, now the training is successful.
I’m guessing that after passing model
into the Trainer
constructor it gets modified and the optimizer
parameters are no longer valid. This is because in the script (with PASS_OPTIMIZER_TO_TRAINER = False
) if line -4 is uncommented it has no effect, indicating that optimizer
is now the same as trainer.optimizer
.
Expected behavior
The script should work properly as is. It should have identical results whether PASS_OPTIMIZER_TO_TRAINER
is True
or False
.
If my guess is correct, then I don’t see how the optimizer argument of Trainer
can accept an optimizer object. But then that creates issues for anyone wanting to use Trainer
with a custom optimizer.
Issue Analytics
- State:
- Created a year ago
- Comments:9 (9 by maintainers)
Top GitHub Comments
That’d be up to @sgugger, who is OOF until the 30th on holiday 😄 I’ll make sure he sees this though when he’s back!
cc @muellerzr, would you like to take a look at this while Sylvain is on leave?