Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Global Accelerator.RNG_types is passed and modified when preparing

See original GitHub issue

Hi,

Correct me if I’m wrong but:

The seed for sampler generator is currently set with generator.manual_seed(int(torch.empty((), dtype=torch.int64).random_().item())). Ref To my understanding without setting initial seed, the sampler generator is different across GPUs and should be synchronized at every step (or at least the first time calling iter()). This is true provided the rng_types contains [‘generator’] (when called the first time).

However, when preparing dataloader, the global accelerator.rng_types is passed around and then could be modified (remove ‘generator’ from it) (ref). This occurs when preparing dataloader (that doesn’t shuffle) (eval loader).

So I thought after calling train_loader, eval_loader = accelerator.prepare(train_loader, eval_loader)

the rng_state is now empty list, and train_loader sampler generator is not synchronized.

Should this be an issue? if not what is the logic.

Note: If same seed is required at for all process, then dropout wouldn’t operate independently across GPUs (ref)

Example code:

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
from torch.utils.data import Dataset, DataLoader
import random
from accelerate import Accelerator

def set_seed(seed):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

class CustomDataset(Dataset):
    def __init__(self, size=10) -> None:
        super().__init__()
        self.size=10
    
    def __getitem__(self, i):
        return torch.Tensor([i]), torch.Tensor([1, i , 2])
    
    def __len__(self):
        return self.size

class CustomModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear1 = nn.Linear(1, 3)
        self.dropout = nn.Dropout(p=0.5)

    def forward(self, x):
        return self.dropout(self.linear1(x))

def main():
    accelerator = Accelerator()
    set_seed(66+accelerator.process_index)
    device = torch.device('cuda:7')

    dataset = CustomDataset(size=10)
    dataloader = DataLoader(dataset, batch_size=5, num_workers=0, shuffle=True)
    eval_dataloader = DataLoader(dataset, batch_size=5, num_workers=0, shuffle=False)

    model = CustomModel()
    optimizer = torch.optim.Adam(model.parameters(), lr=0.5)
    model, dataloader, optimizer, eval_dataloader = accelerator.prepare(model, dataloader, optimizer, eval_dataloader)
    model.train()
    for input_, output_ in dataloader:
        res = model(input_)
        print(f'{accelerator.process_index}, {input_}')
    
if __name__ == '__main__':
    main()
-----
Output: 
1, tensor([[2.],                                                                                                                          
        [8.],                                                                                                                             
        [4.],                                                                                                                             
        [7.],                                                                                                                             
        [6.]], device='cuda:1')                                                                                                           
0, tensor([[9.],                                                                                                                          
        [8.],                                                                                                                             
        [0.],                                                                                                                             
        [2.],                                                                                                                             
        [3.]], device='cuda:0')

Thanks.

Issue Analytics

State:
Created 2 years ago
Comments:7 (3 by maintainers)

Top GitHub Comments

1reaction

sguggercommented, Jun 29, 2021

No, it is, as the name indicates, a BatchSampler. It only computes indices (specifically the indices of the elements we want in each batch), the access to the Dataset is done later inside the DataLoader.

0reactions

github-actions[bot]commented, May 24, 2022

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Top Results From Across the Web

AWS Global Accelerator FAQs - Amazon Web Services

A: AWS Global Accelerator utilizes the Amazon global network, allowing you to improve the performance of your applications by lowering first byte latency...

AWS Global Accelerator - Jayendra's Cloud Certification Blog

AWS Global Accelerator is a networking service that helps improve the availability and performance of the applications to global users.

Introduction to AWS Global Accelerator - Whizlabs Blog

AWS Global Accelerator improves the performance and availability of applications with users (local or global). Let's get started and learn more!

AWS Global Accelerator - YouTube

AWS Global Accelerator - Improve Global Application Availability and Performance for Your Traffic. 20K views · 2 years ago ...more ...