Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

~7% drop in performance is noticed for huggingface GPT2 model

See original GitHub issue

System Info

platform: ROCm AMD device python version: 3.7.13

There is a ~7% drop in performance noticed for huggingface GPT2 model after the IFU (https://github.com/ROCmSoftwarePlatform/transformers/pull/15) on https://github.com/ROCmSoftwarePlatform/transformers repository.

@patil-suraj, @patrickvonplaten, could you please help me in finding the change in transformers that is responsible for the drop in performance?

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

Command used to run the model:

python3 -m torch.distributed.launch --nproc_per_node=8 transformers/examples/pytorch/language-modeling/run_clm.py --output_dir output --model_name_or_path gpt2 --dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 --do_train --do_eval --label_smoothing 0.1 --logging_steps 1 --logging_dir log --fp16 --dataloader_num_workers 1 --skip_memory_metrics --per_device_train_batch_size=8 --overwrite_output_dir --max_steps 150

Expected behavior

I was expecting to see similar or better performance of the model after IFU on Aug 9, 2022.

I also tried with the recent commits after Aug 9, 2022. Those seem to worsen the performance much more.

Issue Analytics

State:
Created a year ago
Comments:10 (4 by maintainers)

Top GitHub Comments

1reaction

amathews-amdcommented, Oct 13, 2022

@rraminen , I dont think upstream HF can help much here; this is on AMD to root-cause. Please close this ticket.

Let’s get started with figuring out which commit caused the regression on ROCm, and tracking internally.

0reactions

julien-ccommented, Oct 13, 2022

Just seconding what @LysandreJik said: if we can help in any way to improve support or performance of our software on AMD chips, we’d like to help

Just ping us

Top Results From Across the Web

OpenAI GPT2 - Hugging Face

GPT-2 is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than the left....

How To Fit a Bigger Model and Train It Faster - Hugging Face

Load Model First, we load the bert-large-uncased model. We load the model weights directly to the GPU so that we can check how...

Fine-tuning a masked language model - Hugging Face Course

This process of fine-tuning a pretrained language model on in-domain data is usually called domain adaptation.

gpt2 - Hugging Face

Pretrained model on English language using a causal language modeling (CLM) objective. It was introduced in this paper and first released at this...

OpenAI GPT2 — transformers 3.5.0 documentation

Leveraging this feature allows GPT-2 to generate syntactically coherent text as it can be observed in the run_generation.py example script. The PyTorch models...