Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Windows performance orders of magnitude lower than Linux's

See original GitHub issue

Hello, everyone

I usually develop and work in Linux, but my users are mostly on Windows. So, I tried my code on 3 different Windows machines today (just in case), and quickly saw that the training performance in Linux is orders of magnitude better than in Windows.

After some checks, I realised that:

PyTorch says that it uses the GPU, and it ideed seems to do so
the task manager shows that the GPU memory is indeed filling up
the GPU has short bursts of utilisation for a second or two that correspond to the actual training, and then idles for almost a minute.

I profiled the code and realised that in two training epochs, more than 90 seconds (!) are used by the function CloudPickler.dump(). Two training epochs in Linux with the same data (and same code) require around 3 seconds with validation.

I have seen this (or similar) problem in Windows when (explicitly) using multiprocessing: in contrast to Linux, where data is accessed in shared memory, in Windows data is passed across using pickle, and this is extremely slow.

Are you guys aware of a solution or a workaround for this?

Thanks! Aaron

Issue Analytics

State:
Created 3 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

3reactions

aarponcommented, Nov 30, 2020

The persistent_workers=True argument seems to work well (it requires PyTorch >= 1.7.0, though). However, I had to remove the pin_memory=True argument to prevent a crash with following error:

  File "...\envs\deep_tools\lib\site-packages\torch\utils\data\_utils\pin_memory.py", line 28, in _pin_memory_loop
    idx, data = r
ValueError: not enough values to unpack (expected 2, got 0)

Without persistent_workers=True, pin_memory works fine.

0reactions

wylicommented, Dec 9, 2020

The persistent_workers=True argument seems to work well (it requires PyTorch >= 1.7.0, though). However, I had to remove the pin_memory=True argument to prevent a crash with following error:
  File "...\envs\deep_tools\lib\site-packages\torch\utils\data\_utils\pin_memory.py", line 28, in _pin_memory_loop
    idx, data = r
ValueError: not enough values to unpack (expected 2, got 0)
Without persistent_workers=True, pin_memory works fine.

https://github.com/pytorch/pytorch/issues/48370