Tensors left behind on CPU in DataParallel Implementation
See original GitHub issueI am encountering an issue with tensors being left behind on the CPU, triggering an assertion error as follows: AssertionError: Gather function not implemented for CPU tensors
. This is happening within the process_sample
function.
I can confirm that this is strictly associated with DataParallel since passing only a single CUDA device suppresses the error.
Issue Analytics
- State:
- Created 2 years ago
- Comments:5
Top Results From Across the Web
DataParallel does not work with tensors of dimension 0 #9811
Issue description. I have a network that return a single value, which is a dimensionless tensor as of PyTorch 0.4.0
Read more >Optional: Data Parallelism — PyTorch Tutorials 1.13.1+cu117 ...
In this tutorial, we will learn how to use multiple GPUs using DataParallel . It's very easy to use GPUs with PyTorch. You...
Read more >Distributed data parallel training in Pytorch
The easiest way to speed up neural network training is to use a GPU, which provides large speedups over CPUs on the types...
Read more >PyTorch 101, Part 4: Memory Management and Using Multiple ...
Moving tensors around CPU / GPUs ... Every Tensor in PyTorch has a to() member function. It's job is to put the tensor...
Read more >PyTorch Distributed: Experiences on Accelerating Data ...
... design, implementation, and evalu- ation of the PyTorch distributed data parallel module. ... CPU input tensors to eliminate the overhead of copying...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@yyeboah I created PR #62 to address the empty tensors potentially being created on the CPU and this could be the cause of the issue you’re seeing. If you can grab the changes from that PR and test them out that would be great as I don’t have a multi-GPU setup readily available to test this; it works of course on my single GPU machine.
@anmatako I have retrieved PR #62 and after testing, I can confirm that indeed this issue has resolved. Cheers, and have a happy New year !