Multiprocessing with any DataPipe writing to local file
See original GitHub issue🐛 Describe the bug
We need to take extra care all DataPipe that would write to file system when DataLoader2 triggered multiprocessing. If the file name on the local file system is same across multiple processes, it would be a racing condition.
This is found when TorchText team is using on_disk_cache
to cache file.
DataLoader needs to know such DataPipe must be sharded with multiprocessing or enforce it into single process.
As a workaround, users have to download the file to local file system to prevent writing within DataPipe.
Versions
main branch
Issue Analytics
- State:
- Created 2 years ago
- Comments:13 (11 by maintainers)
Top Results From Across the Web
Python multiprocessing safely writing to a file - Stack Overflow
@GP89 mentioned a good solution. Use a queue to send the writing tasks to a dedicated process that has sole write access to...
Read more >Multiprocessing in Python | Set 2 (Communication between ...
In multiprocessing, any newly created process will do following: run independently; have their own memory space.
Read more >multithreading, multiprocessing and parallel python ...
We will generate some data using one of the python files makedata.py by importing it in ipython. import makedata data = makedata.data() data....
Read more >Python Multiprocessing with output to file | by Bk Lim - Medium
Recently I encountered a scenario where I need to write the parallelized results into an output file, and the direct approach of adding...
Read more >multiprocessing.shared_memory — Shared memory for direct ...
As a resource for sharing data across processes, shared memory blocks may outlive the original process that created them. When one process no...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I think if we can incorporate that into the
IoPathSaver
DataPipe, it should be a viable cross-platform solution, but it would mean users have to installiopath
andportalocker
if they wish to lock files across processes.Thanks for the ping Parmeet. I haven’t encountered this issue thus far, because torchvision datasets do not write anything to disk.