InfoGraph example fails on GPU
See original GitHub issue🐛 Bug
Running the InfoGraph example on GPU fails.
return th.repeat_interleave(input, repeats, dim) # PyTorch 1.1
RuntimeError: repeats must have the same size as input along dim
All I did is run:
python infograph/semisupervised.py --gpu 0 --target mu
To Reproduce
Steps to reproduce the behavior:
- Go to DGL/examples folder
- Run semisupervised eample
Traceback (most recent call last): File “semisupervised.py”, line 217, in <module> for sup_data, unsup_data in zip(train_loader, unsup_loader): File “/home/neo/wellth-wrk/env/lib/python3.8/site-packages/torch/utils/data/dataloader.py”, line 530, in next data = self._next_data() File “/home/neo/wellth-wrk/env/lib/python3.8/site-packages/torch/utils/data/dataloader.py”, line 570, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File “/home/neo/wellth-wrk/env/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py”, line 52, in fetch return self.collate_fn(data) File “semisupervised.py”, line 116, in collate graph_id = dgl.broadcast_nodes(batched_graph, graph_id) File “/home/neo/wellth-wrk/env/lib/python3.8/site-packages/dgl/readout.py”, line 418, in broadcast_nodes return F.repeat(graph_feat, graph.batch_num_nodes(ntype), dim=0) File “/home/neo/wellth-wrk/env/lib/python3.8/site-packages/dgl/backend/pytorch/tensor.py”, line 189, in repeat return th.repeat_interleave(input, repeats, dim) # PyTorch 1.1 RuntimeError: repeats must have the same size as input along dim
Expected behavior
Code runs and finishes training.
Environment
- DGL Version (e.g., 1.0): 0.6.1
- Backend Library & Version (e.g., PyTorch 0.4.1, MXNet/Gluon 1.3):1.11.0
- OS (e.g., Linux): Ubuntu
- How you installed DGL (
conda
,pip
, source): PIP - Build command you used (if compiling from source):
- Python version: 3.8
- CUDA/cuDNN version (if applicable): 11.4
- GPU models and configuration (e.g. V100): Titan RTX
- Any other relevant information:
Additional context
Issue Analytics
- State:
- Created a year ago
- Comments:11 (1 by maintainers)
Sure. I will take a look.
The root cause of the crash is due to this PR: https://github.com/dmlc/dgl/pull/3351/files