question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add multi-threads support to samples crop

See original GitHub issue

Is your feature request related to a problem? Please describe. Currently, there are 4 crop transforms can generate a list of samples, and we crop the images in a for loop. After some testing, I found that if executing in multi-threads, it can be much faster. So it would be useful to add num_workers support to these transforms, similar to the CacheDataset.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:31 (31 by maintainers)

github_iconTop GitHub Comments

1reaction
wylicommented, Aug 19, 2021

Thanks for the update, looks great, I think we can have a separate ticket to enhance the thread based loader.

1reaction
ericspodcommented, Aug 19, 2021

I’m a bit late to the party but my few observations. Multi-threading has a number of problems relating to the GIL as we know but often we can route around that by using compiled functions in Numpy, Scipy, Pytorch, etc. Mixing threads and processes will be inefficient regardless because we typically create as many processes as we have CPU cores (virtual and physical). If we have multiple threads in these subprocesses running a transform pipeline there will be more threads than CPU cores and that will lose efficiency through contention. I don’t think the advantages of accessing memory efficiently in threads would overcome that. I generally would suggest using either threads or processes for parallelism and not to mix.

The problem here with RandCropByPosNegLabeld is that this is a one-to-many transform where generating the many with multiple threads may be faster. If this is used on its own this might be the case but if used with multiple processes you could create too many threads. I would think it would be faster to change the transform to be one-to-one so that you get one cropped image for each input but give each transform the same input, ie. if you had a batch of duplicate images. This might work for particular use cases that would expect one-to-many but I’m personally not sure how this class is used now so I can’t say for sure this makes sense.

ThreadDataLoader doesn’t benefit from having a buffer size larger than 1 typically unless the sizes of the buffered objects vary wildly and so take varying amounts of time to generate. I left the option to change the buffer size in the original implementation to allow experimentation. The idea of ThreadDataLoader is to permit a separate thread to read from a data loader it had exclusive access to so that thread-safety wasn’t a problem.

One idea I’ve been meaning to try is a DataLoader using threads instead of processes which would lack the interprocess communication overhead the current implementation must have, but would rely on using a lot of compiled functions to not get bogged down by the GIL and thread-safety of the source DataSet. This might be entirely unrelated to the problem at hand with the transforms being one-to-many.

I think @Nic-Ma has written something on results that make all this less meaningful but it’s here for us to consider for later.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Multithreading Programming Examples - NI
In Example 1, a data acquisition program in C creates and synchronizes the separate threads for acquiring, processing, and displaying the ...
Read more >
Tasks and Parallelism: The New Wave of Multithreading
In this example, the GetStringLengthAsync method will execute its code when it is called and in the same thread as the caller. The...
Read more >
Using WebAssembly threads from C, C++ and Rust - web.dev
Learn how to bring multithreaded applications written in other languages to WebAssembly.
Read more >
How does incrementing a value across multiple threads in ...
For the first example, each thread has its own counter and it is incremented separately. Integer is immutable, when it is incremented the ......
Read more >
Know Creating Threads and Multithreading in Java - Edureka
A thread is actually a lightweight process. Unlike many other computer languages, Java provides built-in support for multithreaded programming.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found