question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can't Select Specific GPU by TrainingArguments

See original GitHub issue

Environment info

  • transformers version: 4.8.2
  • Platform: Jupyter Notebook on Ubuntu
  • Python version: 3.7
  • PyTorch version (GPU?): 1.8.0+cu111
  • Using GPU in script?: No, By Jupyter Notebook
  • Using distributed or parallel set-up in script?:It is distributed but I don’t want that

Who can help

To reproduce

By TrainingArguments, I want to set up my compute device only to torch.device(type=‘cuda’, index=1).

If I not set local_rank when init TrainingArguments, it will compute on both GPU.

Steps to reproduce the behavior:

from transformers import TrainingArguments, Trainer, EvalPrediction

training_args = TrainingArguments(
    learning_rate=1e-4,
    num_train_epochs=6,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=32,
    logging_steps=200,
    output_dir="./training_output",
    overwrite_output_dir=True,
    # The next line is important to ensure the dataset labels are properly passed to the model
    remove_unused_columns=False,
    local_rank= 1
)

Then you will get ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable RANK expected, but not set

But after I set

import os
os.environ["RANK"]="1"

I get ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable WORLD_SIZE expected, but not set

These error not happen if I not set local_rank when init TrainingArguments even though I don’t set any environment variable.

Expected behavior

I want to set up my compute device only to torch.device(type=‘cuda’, index=1).

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:18 (5 by maintainers)

github_iconTop GitHub Comments

4reactions
sguggercommented, Oct 27, 2021

You need to set the variable before launching the jupyter notebook

CUDA_VISIBLE_DEVICES="0" jupyter notebook
4reactions
sguggercommented, Jul 7, 2021

You should use the env variable CUDA_VISIBLE_DEVICES to set the GPUs you want to use. If you have multiple GPUs available, the Trainer will use all of them, that is expected and not a bug.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Trainer - Hugging Face
Before instantiating your Trainer, create a TrainingArguments to access all the ... Now let's discuss how to select specific GPUs and control their...
Read more >
Select GPU to use by specific applications
I want to force some applications to use Radeon HD 6670 instead of GeForce GTX 1060, but I can't find a way to...
Read more >
Trainer — PyTorch Lightning 1.8.5.post0 documentation
If enabled and devices is an integer, pick available GPUs automatically. ... Model-specific callbacks can also be added inside the LightningModule through ...
Read more >
A complete Hugging Face tutorial: how to build and train a ...
Everything is a Python object but that doesn't mean that it can't be ... This will instantiate the selected model and assign the...
Read more >
distilbert tutorial - Pescheria la Perla
DistilBERT on GPU Tutorial: Classification Problem. ... 2019 · While most prior work investigated the use of distillation for building task-specific models, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found