Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Passing optimizer to Trainer constructor does not work

See original GitHub issue

System Info

transformers version: 4.20.1
Platform: Linux-5.4.188±x86_64-with-Ubuntu-18.04-bionic
Python version: 3.7.13
Huggingface_hub version: 0.8.1
PyTorch version (GPU?): 1.12.0+cu102 (False)
Tensorflow version (GPU?): 2.8.2 (False)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: TPU (used on the platform, nothing specific in the script)
Using distributed or parallel set-up in script?: No

Who can help?

@sgugger

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

Run the following script, which passes an optimizer object to Trainer.

import numpy as np
import site
import torch
from datasets import load_dataset, load_metric
from transformers import AutoModelForSequenceClassification, AutoTokenizer, Trainer, TrainingArguments, set_seed
from transformers.trainer_pt_utils import get_parameter_names

PASS_OPTIMIZER_TO_TRAINER = True

MODEL_NAME = 'albert-large-v2'
TASK = 'rte'
MAX_SEQ_LENGTH = 128
EPOCHS = 1
LEARNING_RATE = 2e-5
SEED = 10000
OPTIMIZER = 'adamw_torch'
OUTPUT_DIR = 'output'

train_args = TrainingArguments(num_train_epochs=EPOCHS, 
                               learning_rate=LEARNING_RATE,
                               seed=SEED,
                               optim=OPTIMIZER,
                               output_dir=OUTPUT_DIR,
                               overwrite_output_dir=True,
                               evaluation_strategy='epoch',
                               do_eval=True,
                               full_determinism=True)

set_seed(SEED)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
raw_datasets = load_dataset("glue", TASK)
metric = load_metric("glue", TASK)


def compute_metrics(p):
    preds = p.predictions
    preds = np.argmax(p.predictions, axis=1)
    return metric.compute(predictions=preds, references=p.label_ids)
 
def preprocess_function(examples):
    # Tokenize the texts
    args = (
        (examples['sentence1'], examples['sentence2'])
    )
    return tokenizer(*args, padding="max_length", max_length=MAX_SEQ_LENGTH, truncation=True)

raw_datasets = raw_datasets.map(
    preprocess_function,
    batched=True)


train_dataset = raw_datasets["train"]
eval_dataset = raw_datasets["validation"]

if not PASS_OPTIMIZER_TO_TRAINER:
    trainer = Trainer(
        model=model, 
        args=train_args, 
        train_dataset=train_dataset, 
        eval_dataset=eval_dataset, 
        compute_metrics=compute_metrics, 
        tokenizer=tokenizer) 

# Create adamw_torch optimizer manually
decay_parameters = get_parameter_names(model, [torch.nn.LayerNorm])
decay_parameters = [name for name in decay_parameters if "bias" not in name]
optimizer_grouped_parameters = [
    {
        "params": [p for n, p in model.named_parameters() if n in decay_parameters],
        "weight_decay": train_args.weight_decay,
    },
    {
        "params": [p for n, p in model.named_parameters() if n not in decay_parameters],
        "weight_decay": 0.0,
    },
]
optimizer = torch.optim.AdamW(optimizer_grouped_parameters,
                              lr=train_args.learning_rate,
                              betas=(train_args.adam_beta1, train_args.adam_beta2),
                              eps=train_args.adam_epsilon)

if PASS_OPTIMIZER_TO_TRAINER:
    trainer = Trainer(
        model=model, 
        args=train_args, 
        train_dataset=train_dataset, 
        eval_dataset=eval_dataset, 
        compute_metrics=compute_metrics, 
        tokenizer=tokenizer,
        optimizers=(optimizer, None))
else:
    #trainer.optimizer = optimizer
    pass

trainer.train()

The model fails to train. Also the training pass runs at about 2x the normal speed.

If the variable PASS_OPTIMIZER_TO_TRAINER is now set to False, the Trainer creates its optimizer based on train_args, which should be identical to the manually created one. However, now the training is successful.

I’m guessing that after passing model into the Trainer constructor it gets modified and the optimizer parameters are no longer valid. This is because in the script (with PASS_OPTIMIZER_TO_TRAINER = False) if line -4 is uncommented it has no effect, indicating that optimizer is now the same as trainer.optimizer.

Expected behavior

The script should work properly as is. It should have identical results whether PASS_OPTIMIZER_TO_TRAINER is True or False.

If my guess is correct, then I don’t see how the optimizer argument of Trainer can accept an optimizer object. But then that creates issues for anyone wanting to use Trainer with a custom optimizer.