Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`CUDA error: unspecified launch failure`, similar to #3802

See original GitHub issue

🐛 Bug

I am seeing the same issue that was reported as fixed in #3841 in the latest 0.9.0 (and everything down to releases lower than 0.8.0. See #3802 for more context. As previous reported by @wsjeon, I am seeing this issue using DGL with pytorch lightning, though I haven’t tried to see if I can reproduce the problem without using this package.

Tagging @BarclayII and @nv-dlasalle who previously investigated this.

To Reproduce

Steps to reproduce the behavior:

Setup environment

conda config --env --add channels dglteam 
conda config --env --add channels pytorch
conda install dgl-cuda11.3 pytorch-lightning cudatoolkit=11.3 pytorch=1.12.1

Run this:

import torch
import dgl
import pytorch_lightning as pl

class MyModel(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.layer = torch.nn.Linear(10, 10)

    def training_step(self, batch, batch_nb):
        return torch.tensor(2)

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=0.02)

class MyDataset(torch.utils.data.Dataset):
    def __init__(self):
        super().__init__()
        
    def __len__(self):
        return 10

    def __getitem__(self, idx):
        g = dgl.graph(data=([0,1],[1,0]), num_nodes=2)
        return g, torch.tensor([0])

def collate_graphs(samples):
    graphs = [x[0] for x in samples]
    batched_graph = dgl.batch(graphs)
    targets = torch.cat([x[1] for x in samples])
    return batched_graph, targets

loader = torch.utils.data.DataLoader(dataset=MyDataset(), batch_size=2, num_workers=2, collate_fn=collate_graphs)
model = MyModel()

trainer = pl.Trainer(
    strategy='ddp',
    accelerator='gpu',
    devices=[0],
    fast_dev_run=True,
)

trainer.fit(model, loader)

Stack trace:

Epoch 0:   0%|                                                                                                                                                       | 0/1 [00:00<?, ?it/s]Traceback (most recent call last):
  File "miniconda3/envs/dgl-test/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 721, in _call_and_handle_interrupt
    return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs)
  File "miniconda3/envs/dgl-test/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch
    return function(*args, **kwargs)
  File "miniconda3/envs/dgl-test/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 811, in _fit_impl
    results = self._run(model, ckpt_path=self.ckpt_path)
  File "miniconda3/envs/dgl-test/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1236, in _run
    results = self._run_stage()
  File "miniconda3/envs/dgl-test/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1323, in _run_stage
    return self._run_train()
  File "miniconda3/envs/dgl-test/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1353, in _run_train
    self.fit_loop.run()
  File "miniconda3/envs/dgl-test/lib/python3.10/site-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "miniconda3/envs/dgl-test/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 266, in advance
    self._outputs = self.epoch_loop.run(self._data_fetcher)
  File "miniconda3/envs/dgl-test/lib/python3.10/site-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "miniconda3/envs/dgl-test/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 171, in advance
    batch = next(data_fetcher)
  File "miniconda3/envs/dgl-test/lib/python3.10/site-packages/pytorch_lightning/utilities/fetching.py", line 184, in __next__
    return self.fetching_function()
  File "miniconda3/envs/dgl-test/lib/python3.10/site-packages/pytorch_lightning/utilities/fetching.py", line 269, in fetching_function
    return self.move_to_device(batch)
  File "miniconda3/envs/dgl-test/lib/python3.10/site-packages/pytorch_lightning/utilities/fetching.py", line 284, in move_to_device
    batch = self.batch_to_device(batch)
  File "miniconda3/envs/dgl-test/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1765, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "miniconda3/envs/dgl-test/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 230, in batch_to_device
    return model._apply_batch_transfer_handler(batch, device=device, dataloader_idx=dataloader_idx)
  File "miniconda3/envs/dgl-test/lib/python3.10/site-packages/pytorch_lightning/core/lightning.py", line 291, in _apply_batch_transfer_handler
    batch = hook(batch, device, dataloader_idx)
  File "miniconda3/envs/dgl-test/lib/python3.10/site-packages/pytorch_lightning/core/hooks.py", line 713, in transfer_batch_to_device
    return move_data_to_device(batch, device)
  File "miniconda3/envs/dgl-test/lib/python3.10/site-packages/pytorch_lightning/utilities/apply_func.py", line 354, in move_data_to_device
    return apply_to_collection(batch, dtype=dtype, function=batch_to)
  File "miniconda3/envs/dgl-test/lib/python3.10/site-packages/pytorch_lightning/utilities/apply_func.py", line 121, in apply_to_collection
    v = apply_to_collection(
  File "miniconda3/envs/dgl-test/lib/python3.10/site-packages/pytorch_lightning/utilities/apply_func.py", line 99, in apply_to_collection
    return function(data, *args, **kwargs)
  File "miniconda3/envs/dgl-test/lib/python3.10/site-packages/pytorch_lightning/utilities/apply_func.py", line 347, in batch_to
    data_output = data.to(device, **kwargs)
  File "miniconda3/envs/dgl-test/lib/python3.10/site-packages/dgl/heterograph.py", line 5448, in to
    ret._graph = self._graph.copy_to(utils.to_dgl_context(device))
  File "miniconda3/envs/dgl-test/lib/python3.10/site-packages/dgl/heterograph_index.py", line 236, in copy_to
    return _CAPI_DGLHeteroCopyTo(self, ctx.device_type, ctx.device_id)
  File "dgl/_ffi/_cython/./function.pxi", line 293, in dgl._ffi._cy3.core.FunctionBase.__call__
  File "dgl/_ffi/_cython/./function.pxi", line 225, in dgl._ffi._cy3.core.FuncCall
  File "dgl/_ffi/_cython/./function.pxi", line 215, in dgl._ffi._cy3.core.FuncCall3
dgl._ffi.base.DGLError: [12:47:27] /opt/dgl/src/runtime/cuda/cuda_device_api.cc:114: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading: CUDA: unspecified launch failure
Stack trace:
  [bt] (0) miniconda3/envs/dgl-test/lib/python3.10/site-packages/dgl/libdgl.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x4f) [0x7f51d135fd6f]
  [bt] (1) miniconda3/envs/dgl-test/lib/python3.10/site-packages/dgl/libdgl.so(dgl::runtime::CUDADeviceAPI::AllocDataSpace(DLContext, unsigned long, unsigned long, DLDataType)+0x108) [0x7f51d183a4a8]
  [bt] (2) miniconda3/envs/dgl-test/lib/python3.10/site-packages/dgl/libdgl.so(dgl::runtime::NDArray::Empty(std::vector<long, std::allocator<long> >, DLDataType, DLContext)+0x361) [0x7f51d16ac5d1]
  [bt] (3) miniconda3/envs/dgl-test/lib/python3.10/site-packages/dgl/libdgl.so(dgl::runtime::NDArray::CopyTo(DLContext const&, void* const&) const+0xc7) [0x7f51d16e8bb7]
  [bt] (4) miniconda3/envs/dgl-test/lib/python3.10/site-packages/dgl/libdgl.so(dgl::UnitGraph::CopyTo(std::shared_ptr<dgl::BaseHeteroGraph>, DLContext const&, void* const&)+0x317) [0x7f51d17f9db7]
  [bt] (5) miniconda3/envs/dgl-test/lib/python3.10/site-packages/dgl/libdgl.so(dgl::HeteroGraph::CopyTo(std::shared_ptr<dgl::BaseHeteroGraph>, DLContext const&, void* const&)+0x109) [0x7f51d16fa939]
  [bt] (6) miniconda3/envs/dgl-test/lib/python3.10/site-packages/dgl/libdgl.so(+0x73b9c9) [0x7f51d17079c9]
  [bt] (7) miniconda3/envs/dgl-test/lib/python3.10/site-packages/dgl/libdgl.so(DGLFuncCall+0x48) [0x7f51d168a928]
  [bt] (8) miniconda3/envs/dgl-test/lib/python3.10/site-packages/dgl/_ffi/_cy3/core.cpython-310-x86_64-linux-gnu.so(+0x16143) [0x7f51f4995143]



During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "gnn-tagger/GNNJetTagger/gnn_tagger/training/minimal.py", line 43, in <module>
    trainer.fit(model, loader)
  File "miniconda3/envs/dgl-test/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 770, in fit
    self._call_and_handle_interrupt(
  File "miniconda3/envs/dgl-test/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 738, in _call_and_handle_interrupt
    self._teardown()
  File "miniconda3/envs/dgl-test/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1300, in _teardown
    self.strategy.teardown()
  File "miniconda3/envs/dgl-test/lib/python3.10/site-packages/pytorch_lightning/strategies/ddp.py", line 482, in teardown
    self.lightning_module.cpu()
  File "miniconda3/envs/dgl-test/lib/python3.10/site-packages/pytorch_lightning/core/mixins/device_dtype_mixin.py", line 147, in cpu
    return super().cpu()
  File "miniconda3/envs/dgl-test/lib/python3.10/site-packages/torch/nn/modules/module.py", line 738, in cpu
    return self._apply(lambda t: t.cpu())
  File "miniconda3/envs/dgl-test/lib/python3.10/site-packages/torch/nn/modules/module.py", line 579, in _apply
    module._apply(fn)
  File "miniconda3/envs/dgl-test/lib/python3.10/site-packages/torch/nn/modules/module.py", line 602, in _apply
    param_applied = fn(param)
  File "miniconda3/envs/dgl-test/lib/python3.10/site-packages/torch/nn/modules/module.py", line 738, in <lambda>
    return self._apply(lambda t: t.cpu())
RuntimeError: CUDA error: unspecified launch failure

Expected behavior

Environment

DGL Version (e.g., 1.0): dgl-cuda11.3 0.9.0 py310_0
Backend Library & Version (e.g., PyTorch 0.4.1, MXNet/Gluon 1.3): pytorch 1.12.1 py3.10_cuda11.3_cudnn8.3.2_0
OS (e.g., Linux): Linux
How you installed DGL (conda, pip, source): conda
Build command you used (if compiling from source): NA
Python version: 3.10
CUDA/cuDNN version (if applicable): 11.3
GPU models and configuration (e.g. V100): GeForce RTX 2080
Any other relevant information:

Issue Analytics

State:
Created a year ago
Comments:10 (1 by maintainers)

Top GitHub Comments

2reactions

yaox12commented, Aug 9, 2022

@BarclayII Cannot repro with the GraphSAGE example and dgl 0.9.0. Multi-worker CPU sampling and CUDA dataloader device should have been covered in the unit test now. https://github.com/dmlc/dgl/blob/5ba5106acab6a642e9b790e5331ee519112a5623/tests/pytorch/test_dataloader.py#L185-L187

@samvanstroud Are you using PyTorch 1.12.1? I don’t think DGL has released PyTorch 1.12.1 support. Can you try PyTorch 1.12.0?

1reaction

yaox12commented, Aug 15, 2022

@mufeili I can reproduce this issue with PyTorch 1.12.1, but haven’t found the root cause. Regarding the error message, it seems not related to the tensoradaptor so I’m not sure what changes in PyTorch 1.12.1 break it. I’ll try building from source with PyTorch 1.12.1 and see if the error goes away.

Update: The error disappears when building DGL from source with PyTorch 1.12.1.