Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can't Select Specific GPU by TrainingArguments

See original GitHub issue

Environment info

transformers version: 4.8.2
Platform: Jupyter Notebook on Ubuntu
Python version: 3.7
PyTorch version (GPU?): 1.8.0+cu111
Using GPU in script?: No, By Jupyter Notebook
Using distributed or parallel set-up in script?:It is distributed but I don’t want that

Who can help

trainer: @sgugger find by git-blame: @philschmid

To reproduce

By TrainingArguments, I want to set up my compute device only to torch.device(type=‘cuda’, index=1).

If I not set local_rank when init TrainingArguments, it will compute on both GPU.

Steps to reproduce the behavior:

from transformers import TrainingArguments, Trainer, EvalPrediction

training_args = TrainingArguments(
    learning_rate=1e-4,
    num_train_epochs=6,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=32,
    logging_steps=200,
    output_dir="./training_output",
    overwrite_output_dir=True,
    # The next line is important to ensure the dataset labels are properly passed to the model
    remove_unused_columns=False,
    local_rank= 1
)

Then you will get ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable RANK expected, but not set

But after I set

import os
os.environ["RANK"]="1"

I get ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable WORLD_SIZE expected, but not set

These error not happen if I not set local_rank when init TrainingArguments even though I don’t set any environment variable.

Expected behavior

I want to set up my compute device only to torch.device(type=‘cuda’, index=1).

Issue Analytics

State:
Created 2 years ago
Comments:18 (5 by maintainers)

Top GitHub Comments

4reactions

sguggercommented, Oct 27, 2021

You need to set the variable before launching the jupyter notebook

CUDA_VISIBLE_DEVICES="0" jupyter notebook

4reactions

sguggercommented, Jul 7, 2021

You should use the env variable CUDA_VISIBLE_DEVICES to set the GPUs you want to use. If you have multiple GPUs available, the Trainer will use all of them, that is expected and not a bug.

Top Results From Across the Web

Trainer - Hugging Face

Before instantiating your Trainer, create a TrainingArguments to access all the ... Now let's discuss how to select specific GPUs and control their...

Select GPU to use by specific applications

I want to force some applications to use Radeon HD 6670 instead of GeForce GTX 1060, but I can't find a way to...

Trainer — PyTorch Lightning 1.8.5.post0 documentation

If enabled and devices is an integer, pick available GPUs automatically. ... Model-specific callbacks can also be added inside the LightningModule through ...

A complete Hugging Face tutorial: how to build and train a ...

Everything is a Python object but that doesn't mean that it can't be ... This will instantiate the selected model and assign the...

distilbert tutorial - Pescheria la Perla

DistilBERT on GPU Tutorial: Classification Problem. ... 2019 · While most prior work investigated the use of distillation for building task-specific models, ...