Strange behavior using PyTorch DDPSee original GitHub issue
So far I have been able to use the loss with DDP on a single GPU , it behaves more or less as expected.
But when I use more than 1 device, the following happens:
GPU-0loss is calculated properly
GPU-1loss is close to zero for each batch
I checked the input tensors, devices, tensor values, etc - so far everything seems to be identical for
GPU-0 and other GPUs.
- Created a year ago
Top GitHub Comments
Yes, this means that logits / target lengths tensors do not match the logits / target tensors. If you have logits lengths longer than your logits tensor for instance.