Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Poor performance of SafeSampler compared to the default sequential sampler

See original GitHub issue

Hi, I have been playing around with nonechucks a bit. I observed, that if I use SafeDataset together with standard DataLoader (using default sequential sampler), my CPUs are fully loaded. However, when I use the DataLoader with SafeSampler, then I see usually only one process running and the others are sleeping (probably waiting for synchronization). Could it be that in SafeSampler __next__() method the threads needs to be synchronized due to the while loop? It is a really HUGE difference in performance between using and not using SafeSampler…

However, I understand that if I use DataLoader without SafeSampler, then the sampled examples can be returned several times, which is not usable in my case.

_Originally posted by @brejchajan in https://github.com/msamogh/nonechucks/issues/5#issuecomment-461499282_

Issue Analytics

State:
Created 5 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

nblauchcommented, Mar 8, 2019

I can confirm experiencing this issue as well. In my case, using 16 CPUs for the (safe) dataloader, time to run an epoch is quadrupled by using Safe Dataset/Sampler/DataLoader.

I don’t think there is any particular code which is needed to reproduce. Just iterate over a big enough dataset with enough cores to notice the benefit of using non-safe loading machinery.

1reaction

brejchajancommented, Feb 12, 2019

I have my project which I cannot provide to public right now, but I will prepare a minimal code to reproduce this. Please, give me a few days for this.

Top Results From Across the Web

But what are PyTorch DataLoaders really?

So we've seen that every DataLoader has a sampler internally which is either SequentialSampler or RandomSampler depending on the value of ...

pytorch/sampler.py at master - GitHub

r"""Samples elements sequentially, always in the same order. Args: data_source (Dataset): dataset to sample from.

torch.utils.data — PyTorch 1.13 documentation

A sequential or shuffled sampler will be automatically constructed based on the shuffle argument to a DataLoader . Alternatively, users may use the...

PyTorch Dataset, DataLoader, Sampler and the collate_fn

It would generate a sequence of indices for the whole dataset, ... batch data being grouped differently compare to default collate function.

Everything You Ever Wanted to Know About Cell Culture ...

As mammalian cells do not have a cell wall, the power inputs may need to be kept low, leading to inhomogeneities within the...