Can't Select Specific GPU by TrainingArguments
See original GitHub issueEnvironment info
transformers
version: 4.8.2- Platform: Jupyter Notebook on Ubuntu
- Python version: 3.7
- PyTorch version (GPU?): 1.8.0+cu111
- Using GPU in script?: No, By Jupyter Notebook
- Using distributed or parallel set-up in script?:It is distributed but I don’t want that
Who can help
- trainer: @sgugger find by git-blame: @philschmid
To reproduce
By TrainingArguments, I want to set up my compute device only to torch.device(type=‘cuda’, index=1).
If I not set local_rank when init TrainingArguments, it will compute on both GPU.
Steps to reproduce the behavior:
from transformers import TrainingArguments, Trainer, EvalPrediction
training_args = TrainingArguments(
learning_rate=1e-4,
num_train_epochs=6,
per_device_train_batch_size=32,
per_device_eval_batch_size=32,
logging_steps=200,
output_dir="./training_output",
overwrite_output_dir=True,
# The next line is important to ensure the dataset labels are properly passed to the model
remove_unused_columns=False,
local_rank= 1
)
Then you will get ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable RANK expected, but not set
But after I set
import os
os.environ["RANK"]="1"
I get ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable WORLD_SIZE expected, but not set
These error not happen if I not set local_rank when init TrainingArguments even though I don’t set any environment variable.
Expected behavior
I want to set up my compute device only to torch.device(type=‘cuda’, index=1).
Issue Analytics
- State:
- Created 2 years ago
- Comments:18 (5 by maintainers)
Top Results From Across the Web
Trainer - Hugging Face
Before instantiating your Trainer, create a TrainingArguments to access all the ... Now let's discuss how to select specific GPUs and control their...
Read more >Select GPU to use by specific applications
I want to force some applications to use Radeon HD 6670 instead of GeForce GTX 1060, but I can't find a way to...
Read more >Trainer — PyTorch Lightning 1.8.5.post0 documentation
If enabled and devices is an integer, pick available GPUs automatically. ... Model-specific callbacks can also be added inside the LightningModule through ...
Read more >A complete Hugging Face tutorial: how to build and train a ...
Everything is a Python object but that doesn't mean that it can't be ... This will instantiate the selected model and assign the...
Read more >distilbert tutorial - Pescheria la Perla
DistilBERT on GPU Tutorial: Classification Problem. ... 2019 · While most prior work investigated the use of distillation for building task-specific models, ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
You need to set the variable before launching the jupyter notebook
You should use the env variable
CUDA_VISIBLE_DEVICES
to set the GPUs you want to use. If you have multiple GPUs available, theTrainer
will use all of them, that is expected and not a bug.