Windows performance orders of magnitude lower than Linux's
See original GitHub issueHello, everyone
I usually develop and work in Linux, but my users are mostly on Windows. So, I tried my code on 3 different Windows machines today (just in case), and quickly saw that the training performance in Linux is orders of magnitude better than in Windows.
After some checks, I realised that:
- PyTorch says that it uses the GPU, and it ideed seems to do so
 - the task manager shows that the GPU memory is indeed filling up
 - the GPU has short bursts of utilisation for a second or two that correspond to the actual training, and then idles for almost a minute.
 
I profiled the code and realised that in two training epochs, more than 90 seconds (!) are used by the function CloudPickler.dump(). Two training epochs in Linux with the same data (and same code) require around 3 seconds with validation.
I have seen this (or similar) problem  in Windows when (explicitly) using multiprocessing: in contrast to Linux, where data is accessed in shared memory, in Windows data is passed across using pickle, and this is extremely slow.
Are you guys aware of a solution or a workaround for this?
Thanks! Aaron
Issue Analytics
- State:
 - Created 3 years ago
 - Comments:6 (3 by maintainers)
 

Top Related StackOverflow Question
The
persistent_workers=Trueargument seems to work well (it requiresPyTorch >= 1.7.0, though). However, I had to remove thepin_memory=Trueargument to prevent a crash with following error:Without
persistent_workers=True,pin_memoryworks fine.https://github.com/pytorch/pytorch/issues/48370