question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Poor performance of SafeSampler compared to the default sequential sampler

See original GitHub issue

Hi, I have been playing around with nonechucks a bit. I observed, that if I use SafeDataset together with standard DataLoader (using default sequential sampler), my CPUs are fully loaded. However, when I use the DataLoader with SafeSampler, then I see usually only one process running and the others are sleeping (probably waiting for synchronization). Could it be that in SafeSampler __next__() method the threads needs to be synchronized due to the while loop? It is a really HUGE difference in performance between using and not using SafeSampler…

However, I understand that if I use DataLoader without SafeSampler, then the sampled examples can be returned several times, which is not usable in my case.

_Originally posted by @brejchajan in https://github.com/msamogh/nonechucks/issues/5#issuecomment-461499282_

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
nblauchcommented, Mar 8, 2019

I can confirm experiencing this issue as well. In my case, using 16 CPUs for the (safe) dataloader, time to run an epoch is quadrupled by using Safe Dataset/Sampler/DataLoader.

I don’t think there is any particular code which is needed to reproduce. Just iterate over a big enough dataset with enough cores to notice the benefit of using non-safe loading machinery.

1reaction
brejchajancommented, Feb 12, 2019

I have my project which I cannot provide to public right now, but I will prepare a minimal code to reproduce this. Please, give me a few days for this.

Read more comments on GitHub >

github_iconTop Results From Across the Web

But what are PyTorch DataLoaders really?
So we've seen that every DataLoader has a sampler internally which is either SequentialSampler or RandomSampler depending on the value of ...
Read more >
pytorch/sampler.py at master - GitHub
r"""Samples elements sequentially, always in the same order. Args: data_source (Dataset): dataset to sample from.
Read more >
torch.utils.data — PyTorch 1.13 documentation
A sequential or shuffled sampler will be automatically constructed based on the shuffle argument to a DataLoader . Alternatively, users may use the...
Read more >
PyTorch Dataset, DataLoader, Sampler and the collate_fn
It would generate a sequence of indices for the whole dataset, ... batch data being grouped differently compare to default collate function.
Read more >
Everything You Ever Wanted to Know About Cell Culture ...
As mammalian cells do not have a cell wall, the power inputs may need to be kept low, leading to inhomogeneities within the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found