Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Data Loading/sending to GPU error?

See original GitHub issue

❓ Questions & Help

This doesn’t seem to be a bug with PyTorch Geometric (but I may be mistaken), but I am using the PyTorch Geometric DataLoader and encountering this issue. When I do

pbar = tqdm(train_loader)
for data in pbar:
    data = data.to(device)

where data is initially from a DataLoader (train_loader = DataLoader(train_dataset, batch_size=args.batch_size, shuffle=True, num_workers=args.num_workers), I am getting the following error:

0%|          | 0/5081 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/n/app/python/3.7.4/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/n/app/python/3.7.4/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/home/vym1/nn2/lib/python3.7/site-packages/torch/multiprocessing/reductions.py", line 134, in reduce_tensor
    raise RuntimeError("Cowardly refusing to serialize non-leaf tensor which requires_grad, "
RuntimeError: Cowardly refusing to serialize non-leaf tensor which requires_grad, since autograd does not support crossing process boundaries.  If you just want to transfer the data, call detach() on the tensor before serializing (e.g., putting it on the queue).

I’ve never see this before, and when running on a different dataset with a different model, it works fine. Do you know why this is occurring?

Issue Analytics

State:
Created 3 years ago
Comments:8 (4 by maintainers)

Top GitHub Comments

2reactions

rusty1scommented, Jan 7, 2021

I’m really not sure, but I believe setting num_workers=0 should fix this.

1reaction

rusty1scommented, Jan 10, 2021

serialize non-leaf tensor which requires_grad, since autograd does not support crossing process boundaries.  If you just want to transfer the data, call detach() on the tensor before serializing (e.g., putting it on the queue).

num_workers=0 uses the main thread to load data, so autograd can not fail when computing gradients across process boundaries.