question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DataLoader with num_workers > 1, and a Rand[Zoom/Rotate/Flip)d transforms

See original GitHub issue

Describe the bug When using a DataLoader with num_workers > 1, and a Rand[Zoom/Rotate/Flip)d transform, all the data in the multiple workers have the same random state.

To Reproduce

With train_ds having some random parameterized transforms.

    train_loader: DataLoader = DataLoader(
        train_ds,  # <-- This is a dataset of both the input raw data filenames + definition of transforms
        batch_size=1,
        shuffle=True,
        num_workers=88,
        collate_fn=list_data_collate,
    )

This is particularly disturbing when running on a machine with 40+ CPUs and huge numbers of images have the same parameter augmentation.

Expected behavior Each transform should have it’s own random parameters chosen, regardless of the number of workers chosen.

Screenshots NOTE: The number of replicated rotation values is always equal to the num_workers specified.

Rotating by 19.367042973517755
Rotating by 19.367042973517755
Rotating by 19.367042973517755
Rotating by 19.367042973517755
Rotating by 4.039486469720721
Rotating by 4.039486469720721
Rotating by 4.039486469720721
Rotating by 4.039486469720721
Rotating by 13.13047017599905
Rotating by 13.13047017599905
Rotating by 13.13047017599905
Rotating by 13.13047017599905

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
Nic-Macommented, May 18, 2020

Hi @hjmjohnson ,

Thanks for your bug report. This is an known issue of “numpy + PyTorch multi-processing”. And you can easily fix it by adding below logic to your DataLoader initialization:

def worker_init_fn(worker_id):
    worker_info = torch.utils.data.get_worker_info()
    worker_info.dataset.transform.set_random_state(worker_info.seed % (2 ** 32 - 1))

dataloader = torch.utils.data.DataLoader(... worker_init_fn=worker_init_fn)

Thanks.

0reactions
atbenmurraycommented, May 19, 2020

@Nic-Ma @tvercaut I can certainly help with the wiki stuff

Read more comments on GitHub >

github_iconTop Results From Across the Web

In windows, DataLoader with num_workers > 0 is ... - GitHub
Step 1: create two loader, one with num_workers and one without. import torch.utils.data as Data train_loader = Data.DataLoader(dataset= ...
Read more >
Guidelines for assigning num_workers to DataLoader ...
I am trying to implement the distributed training from PyTorch examples with 4 GPUs (one sub-process for each GPU), but when I set...
Read more >
Complete Guide to the DataLoader Class in PyTorch
This post covers the PyTorch dataloader class. We'll show how to load built-in and custom datasets in PyTorch, plus how to transform and...
Read more >
DataLoaders - fastai
DataLoader helpers. fastai includes a replacement for Pytorch's DataLoader which is largely API-compatible, and adds a lot of useful functionality and ...
Read more >
How does the "number of workers" parameter in PyTorch ...
1 Answer 1 · When num_workers>0 , only these workers will retrieve data, main process won't. · Well our CPU can usually run...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found