Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DataLoader with num_workers > 1, and a Rand[Zoom/Rotate/Flip)d transforms

See original GitHub issue

Describe the bug When using a DataLoader with num_workers > 1, and a Rand[Zoom/Rotate/Flip)d transform, all the data in the multiple workers have the same random state.

To Reproduce

With train_ds having some random parameterized transforms.

    train_loader: DataLoader = DataLoader(
        train_ds,  # <-- This is a dataset of both the input raw data filenames + definition of transforms
        batch_size=1,
        shuffle=True,
        num_workers=88,
        collate_fn=list_data_collate,
    )

This is particularly disturbing when running on a machine with 40+ CPUs and huge numbers of images have the same parameter augmentation.

Expected behavior Each transform should have it’s own random parameters chosen, regardless of the number of workers chosen.

Screenshots NOTE: The number of replicated rotation values is always equal to the num_workers specified.

Rotating by 19.367042973517755
Rotating by 19.367042973517755
Rotating by 19.367042973517755
Rotating by 19.367042973517755
Rotating by 4.039486469720721
Rotating by 4.039486469720721
Rotating by 4.039486469720721
Rotating by 4.039486469720721
Rotating by 13.13047017599905
Rotating by 13.13047017599905
Rotating by 13.13047017599905
Rotating by 13.13047017599905

Issue Analytics

State:
Created 3 years ago
Comments:9 (9 by maintainers)

Top GitHub Comments

1reaction

Nic-Macommented, May 18, 2020

Hi @hjmjohnson ,

Thanks for your bug report. This is an known issue of “numpy + PyTorch multi-processing”. And you can easily fix it by adding below logic to your DataLoader initialization:

def worker_init_fn(worker_id):
    worker_info = torch.utils.data.get_worker_info()
    worker_info.dataset.transform.set_random_state(worker_info.seed % (2 ** 32 - 1))

dataloader = torch.utils.data.DataLoader(... worker_init_fn=worker_init_fn)

Thanks.

0reactions

atbenmurraycommented, May 19, 2020

@Nic-Ma @tvercaut I can certainly help with the wiki stuff

Top Results From Across the Web

In windows, DataLoader with num_workers > 0 is ... - GitHub

Step 1: create two loader, one with num_workers and one without. import torch.utils.data as Data train_loader = Data.DataLoader(dataset= ...

Guidelines for assigning num_workers to DataLoader ...

I am trying to implement the distributed training from PyTorch examples with 4 GPUs (one sub-process for each GPU), but when I set...

Complete Guide to the DataLoader Class in PyTorch

This post covers the PyTorch dataloader class. We'll show how to load built-in and custom datasets in PyTorch, plus how to transform and...

DataLoaders - fastai

DataLoader helpers. fastai includes a replacement for Pytorch's DataLoader which is largely API-compatible, and adds a lot of useful functionality and ...

How does the "number of workers" parameter in PyTorch ...

1 Answer 1 · When num_workers>0 , only these workers will retrieve data, main process won't. · Well our CPU can usually run...