Windows performance orders of magnitude lower than Linux's
See original GitHub issueHello, everyone
I usually develop and work in Linux, but my users are mostly on Windows. So, I tried my code on 3 different Windows machines today (just in case), and quickly saw that the training performance in Linux is orders of magnitude better than in Windows.
After some checks, I realised that:
- PyTorch says that it uses the GPU, and it ideed seems to do so
- the task manager shows that the GPU memory is indeed filling up
- the GPU has short bursts of utilisation for a second or two that correspond to the actual training, and then idles for almost a minute.
I profiled the code and realised that in two training epochs, more than 90 seconds (!) are used by the function CloudPickler.dump()
. Two training epochs in Linux with the same data (and same code) require around 3 seconds with validation.
I have seen this (or similar) problem in Windows when (explicitly) using multiprocessing
: in contrast to Linux, where data is accessed in shared memory, in Windows data is passed across using pickle
, and this is extremely slow.
Are you guys aware of a solution or a workaround for this?
Thanks! Aaron
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (3 by maintainers)
Top GitHub Comments
The
persistent_workers=True
argument seems to work well (it requiresPyTorch >= 1.7.0
, though). However, I had to remove thepin_memory=True
argument to prevent a crash with following error:Without
persistent_workers=True
,pin_memory
works fine.https://github.com/pytorch/pytorch/issues/48370