Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Multi-GPU CLI issue

See original GitHub issue

Hi- Thanks for the great library, Sylvain!

The config file looks as follows:

compute_environment: LOCAL_MACHINE
distributed_type: MULTI_GPU
fp16: true
machine_rank: 0
main_process_ip: null
main_process_port: null
main_training_function: main
num_machines: 1
num_processes: 2

The relevant part of the code is as follows:

    accelerator = Accelerator(fp16=config['fp16'], cpu=config['cpu'])
    print(accelerator.device)

    # Sample hyper-parameters for learning rate, batch size, seed and a few other HPs
    lr = config["lr"]
    num_epochs = int(config["num_epochs"])
    seed = int(config["seed"])
    batch_size = int(config["batch_size"])

    # If the batch size is too big we use gradient accumulation
    gradient_accumulation_steps = 1
    if batch_size > MAX_GPU_BATCH_SIZE:
        gradient_accumulation_steps = batch_size // MAX_GPU_BATCH_SIZE
        batch_size = MAX_GPU_BATCH_SIZE

    # Instantiate dataloaders.
    train_dataloader = DataLoader(
        train_dataset, shuffle=True, collate_fn=collate_fn, batch_size=batch_size
    )
    valid_dataloader = DataLoader(
        validation_dataset, shuffle=False, collate_fn=collate_fn, batch_size=EVAL_BATCH_SIZE
    )
    test_dataloader = DataLoader(
        test_dataset, shuffle=False, collate_fn=collate_fn, batch_size=EVAL_BATCH_SIZE
    )

    # Instantiate the model (we build the model here so that the seed also control new weights initialization)
    model = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-base-960h")


    # Instantiate optimizer
    optimizer = AdamW(params=model.parameters(), lr=lr)

    # Prepare everything
    # There is no specific order to remember, we just need to unpack the objects in the same order we gave them to the
    # prepare method.
    prepared = accelerator.prepare(
        model, optimizer, train_dataloader, valid_dataloader, test_dataloader
    )
    model, optimizer, train_dataloader, valid_dataloader, test_dataloader = prepared


    # Now we train the model
    for epoch in range(num_epochs):
        model.train()
        for step, batch in enumerate(train_dataloader):
            # We could avoid this line since we set the accelerator with `device_placement=True`.
            #batch.to(accelerator.device)
            outputs = model(**batch)
            loss = outputs.loss
            loss = loss / gradient_accumulation_steps
            accelerator.backward(loss)
            if step % gradient_accumulation_steps == 0:
                optimizer.step()
                lr_scheduler.step()
                optimizer.zero_grad()

The script utilizes a single GPU, though there are 2 GPUS.

>>> torch.cuda.device_count()
2

Launching the scipt in the command line:

accelerate launch training.py

The print statement print(accelerator.device) returns following (happy to add more debugging)

cuda

Any help is appreciated. Thank you!

Issue Analytics

State:
Created 2 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

dzorlucommented, Apr 22, 2021

This seems to be a false alarm, the process now sees both GPUs. Thank you for the quick turnaround. Can’t wait to use the library more. Deniz

0reactions

sguggercommented, Apr 23, 2021

Closing the issue then, but feel free to reopen if you get the problem again!

Top Results From Across the Web

how to use multiple GPUs,the default is to use the first CUDA ...

You can do that by specifying jit=False , which is now the default in clip.load() . Once the non-JIT model is loaded, the...

GPU Rendering — Blender Manual

Can multiple GPUs be used for rendering? . Yes, go to Preferences ‣ System ‣ Compute Device Panel, and configure it as...

Train 1 trillion+ parameter models - PyTorch Lightning

Train 1 trillion+ parameter models. When training large models, fitting larger batch sizes, or trying to increase throughput using multi-GPU compute, ...

Multi-Process Service :: GPU Deployment and Management ...

Without MPS, when processes share the GPU their scheduling resources must be swapped on and off the GPU. The MPS server shares one...

Running multi-instance GPUs | Google Kubernetes Engine ...

You can also use any of the supported GPU partition sizes mentioned earlier. To create a cluster with multi-instance GPUs enabled using the...