Multi GPU does not work well
See original GitHub issueSystem Info
- `Accelerate` version: 0.10.0
- Platform: Linux-5.13.0-28-generic-x86_64-with-glibc2.17
- Python version: 3.8.13
- Numpy version: 1.23.0
- PyTorch version (GPU?): 1.9.0+cu111 (True)
- `Accelerate` default config:
- compute_environment: LOCAL_MACHINE
- distributed_type: MULTI_GPU
- mixed_precision: no
- use_cpu: False
- num_processes: 4
- machine_rank: 0
- num_machines: 1
- main_process_ip: None
- main_process_port: None
- main_training_function: main
- deepspeed_config: {}
- fsdp_config: {}
Information
- The official example scripts
- My own modified scripts
Tasks
- One of the scripts in the examples/ folder of Accelerate or an officially supported
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
) - My own task or dataset (give details below)
Reproduction
import os, time
import torch
from torch.optim import Adam, SGD
from accelerate import Accelerator
from PIL import Image
from torchvision import transforms
import torchvision
from datasets import load_dataset
import datetime
from loguru import logger
LOGGER = logger
LOGGER.add('/root/workspace/sdhan/multi_gpu/log/model_log.txt')
def training_function():
# if accelerator is True:
accelerator = Accelerator()
device = accelerator.device
model = torchvision.models.resnet34(pretrained = True).to(device)
preprocess = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
datasets = torchvision.datasets.CIFAR10(root = '/root/workspace/sdhan/multi_gpu/datasets/', train = True, transform = preprocess, download = True)
train_loader = torch.utils.data.DataLoader(
datasets,
batch_size=50,
shuffle=True,
drop_last=True,
num_workers = 8)
optimizer = SGD(model.parameters(), lr = 3e-7)
criterion = torch.nn.CrossEntropyLoss()
# if accelerator is not None:
model, optimizer, train_loader = accelerator.prepare(model, optimizer, train_loader)
epoch = 300
i = 0
model.train()
start_time, end_time= None, None
for epoch_num in range(epoch):
start_epoch_time = datetime.datetime.now()
if start_time is None:
start_time = datetime.datetime.now()
for image, target in train_loader:
try:
image.to(device)
target.to(device)
output = model(image)
loss = criterion(output, target)
accelerator.backward(loss)
optimizer.step()
i += 1
if i%100 == 0:
end_time = datetime.datetime.now()
LOGGER.info(f'time : {end_time - start_time}')
start_time = end_time
except Exception as e:
print(e)
break
end_epoch_time = datetime.datetime.now()
LOGGER.info(f'epoch time : {end_epoch_time - start_epoch_time}')
LOGGER.info(f'epoch_loss : {loss.item()}')
def main():
training_function()
if __name__ == "__main__":
main()
Expected behavior
I think I did everything what 'README.md' said for multi-gpu learning.
When I use the complete_nlp_example.py in example folder, it worked.
(I ran the script with 4 GPU, and for 1 epoch it took around 3 seconds, compared with around 9 second with 1 GPU. )
But when I use the code I pasted, it took around 9 Second for 4 GPU, compared with around 8 second with 1 GPU for 1 epoch. However, I exepcted it would took around 4 second at least when I use 4 GPU.
Strangely, the GPU-util is very good when I use 4 GPU as well as when I use 1 GPU.
What is the problem? not only for the code above, when I try to training another model, same result occured. ( Denoising diffusion probablistic model )
Issue Analytics
- State:
- Created a year ago
- Comments:6
Top Results From Across the Web
What reasons are there as to why Multi-GPU setups don't ...
Since many people say that the reason why multi-GPU setups don't scale well in some games is because there is no code for...
Read more >Why Don't Multiple GPUs Scale Properly? - YouTube
Multi - GPU setups like SLI and CrossFire can deliver performance you just can 't get with a single card, but two GPUs...
Read more >Efficient Training on Multiple GPUs - Hugging Face
When training on a single GPU is too slow or the model weights don't fit in a single GPUs memory we use a...
Read more >Multi-GPU doesn't work for model(inputs) nor when computing ...
Hi,. When using multiple GPUs to perform inference on a model (e.g. the call method: model(inputs) ) and calculate its gradients, the machine ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hello @pacman100, really thanks for your help!
I really appreciate for your help!
I think it is really interesting, and I will close this issue.
Hello @Hans-digit , loss is dependent on the learning rate hyperparameter. With such a small learning rate that you have, more steps are necessary for decreasing loss, which is the case with smaller batch using single GPU. Please finetune it properly and you will see the difference disappear. Measure loss for whole epoch instead of printing the final loss in an epoch. Please refer the example scripts on how to do that. Using 1e-3 as learning rate and measuring epoch loss properly show the below results.
2 GPU setup:
1 GPU setup:
your code with suggested changes: