question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Passing optimizer to Trainer constructor does not work

See original GitHub issue

System Info

  • transformers version: 4.20.1
  • Platform: Linux-5.4.188±x86_64-with-Ubuntu-18.04-bionic
  • Python version: 3.7.13
  • Huggingface_hub version: 0.8.1
  • PyTorch version (GPU?): 1.12.0+cu102 (False)
  • Tensorflow version (GPU?): 2.8.2 (False)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: TPU (used on the platform, nothing specific in the script)
  • Using distributed or parallel set-up in script?: No

Who can help?

@sgugger

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, …)
  • My own task or dataset (give details below)

Reproduction

Run the following script, which passes an optimizer object to Trainer.

import numpy as np
import site
import torch
from datasets import load_dataset, load_metric
from transformers import AutoModelForSequenceClassification, AutoTokenizer, Trainer, TrainingArguments, set_seed
from transformers.trainer_pt_utils import get_parameter_names

PASS_OPTIMIZER_TO_TRAINER = True

MODEL_NAME = 'albert-large-v2'
TASK = 'rte'
MAX_SEQ_LENGTH = 128
EPOCHS = 1
LEARNING_RATE = 2e-5
SEED = 10000
OPTIMIZER = 'adamw_torch'
OUTPUT_DIR = 'output'

train_args = TrainingArguments(num_train_epochs=EPOCHS, 
                               learning_rate=LEARNING_RATE,
                               seed=SEED,
                               optim=OPTIMIZER,
                               output_dir=OUTPUT_DIR,
                               overwrite_output_dir=True,
                               evaluation_strategy='epoch',
                               do_eval=True,
                               full_determinism=True)

set_seed(SEED)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
raw_datasets = load_dataset("glue", TASK)
metric = load_metric("glue", TASK)


def compute_metrics(p):
    preds = p.predictions
    preds = np.argmax(p.predictions, axis=1)
    return metric.compute(predictions=preds, references=p.label_ids)
 
def preprocess_function(examples):
    # Tokenize the texts
    args = (
        (examples['sentence1'], examples['sentence2'])
    )
    return tokenizer(*args, padding="max_length", max_length=MAX_SEQ_LENGTH, truncation=True)

raw_datasets = raw_datasets.map(
    preprocess_function,
    batched=True)


train_dataset = raw_datasets["train"]
eval_dataset = raw_datasets["validation"]

if not PASS_OPTIMIZER_TO_TRAINER:
    trainer = Trainer(
        model=model, 
        args=train_args, 
        train_dataset=train_dataset, 
        eval_dataset=eval_dataset, 
        compute_metrics=compute_metrics, 
        tokenizer=tokenizer) 

# Create adamw_torch optimizer manually
decay_parameters = get_parameter_names(model, [torch.nn.LayerNorm])
decay_parameters = [name for name in decay_parameters if "bias" not in name]
optimizer_grouped_parameters = [
    {
        "params": [p for n, p in model.named_parameters() if n in decay_parameters],
        "weight_decay": train_args.weight_decay,
    },
    {
        "params": [p for n, p in model.named_parameters() if n not in decay_parameters],
        "weight_decay": 0.0,
    },
]
optimizer = torch.optim.AdamW(optimizer_grouped_parameters,
                              lr=train_args.learning_rate,
                              betas=(train_args.adam_beta1, train_args.adam_beta2),
                              eps=train_args.adam_epsilon)

if PASS_OPTIMIZER_TO_TRAINER:
    trainer = Trainer(
        model=model, 
        args=train_args, 
        train_dataset=train_dataset, 
        eval_dataset=eval_dataset, 
        compute_metrics=compute_metrics, 
        tokenizer=tokenizer,
        optimizers=(optimizer, None))
else:
    #trainer.optimizer = optimizer
    pass

trainer.train()

The model fails to train. Also the training pass runs at about 2x the normal speed.

If the variable PASS_OPTIMIZER_TO_TRAINER is now set to False, the Trainer creates its optimizer based on train_args, which should be identical to the manually created one. However, now the training is successful.

I’m guessing that after passing model into the Trainer constructor it gets modified and the optimizer parameters are no longer valid. This is because in the script (with PASS_OPTIMIZER_TO_TRAINER = False) if line -4 is uncommented it has no effect, indicating that optimizer is now the same as trainer.optimizer.

Expected behavior

The script should work properly as is. It should have identical results whether PASS_OPTIMIZER_TO_TRAINER is True or False.

If my guess is correct, then I don’t see how the optimizer argument of Trainer can accept an optimizer object. But then that creates issues for anyone wanting to use Trainer with a custom optimizer.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

2reactions
muellerzrcommented, Aug 18, 2022

That’d be up to @sgugger, who is OOF until the 30th on holiday 😄 I’ll make sure he sees this though when he’s back!

2reactions
LysandreJikcommented, Aug 16, 2022

cc @muellerzr, would you like to take a look at this while Sylvain is on leave?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Trainer - Hugging Face
Setup the optimizer and the learning rate scheduler. We provide a reasonable default that works well. If you want to use something else,...
Read more >
trainer - AllenNLP v1.0.0
A trainer for doing supervised learning with gradient descent. It just takes a labeled dataset and a DataLoader , and uses the supplied...
Read more >
Trainer — Apache MXNet documentation
In MXNet Gluon, this first step is achieved by doing a forward pass by ... we can also pass an optimizer instance directly...
Read more >
nemo.core.classes.modelPT - NVIDIA Documentation Center
_cfg.train_ds is not None: logging.warning( f"If you intend to do ... all other parameters are to be passed into optimizer constructor ...
Read more >
Trainer — pytorch-accelerated 0.1.3 documentation
Trainer (model, loss_func, optimizer, callbacks=(<class ... Passing an instance of a learning rate scheduler will not work here. Parameters.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found