Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CUDA error: device-side assert triggered

See original GitHub issue

Instructions To Reproduce the Issue

I have this metadada from the dataset that I modified and loaded using

register_coco_instances("mis", {}, "./Set1/missouri_camera_traps_set1.json", "./")

It is in COCO camera traps dataset format, so I loaded that way. I got the following metadata class from “MetadataCatalog.get(“mis”)”:

Metadata(evaluator_type='coco', image_root='./', json_file='./Set1/missouri_camera_traps_set1.json', name='mis', thing_classes=['empty', 'agouti', 'collared_peccary', 'paca', 'red_brocket_deer', 'white-nosed_coati', 'spiny_rat', 'ocelot', 'red_squirrel', 'common_opossum', 'bird_spec', 'great_tinamou', 'white_tailed_deer', 'mouflon', 'red_deer', 'roe_deer', 'wild_boar', 'red_fox', 'european_hare', 'wood_mouse', 'coiban_agouti'], thing_dataset_id_to_contiguous_id={0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, 10: 10, 11: 11, 12: 12, 13: 13, 14: 14, 15: 15, 16: 16, 17: 17, 18: 18, 19: 19, 20: 20})

When i try to run the following block, as in the balloon tutorial, I get the output written in the title of this post:

from detectron2.engine import DefaultTrainer
from detectron2.config import get_cfg

cfg = get_cfg()
cfg.merge_from_file("./detectron2_repo/configs/COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml")
cfg.DATASETS.TRAIN = ("mis",)
cfg.DATASETS.TEST = ()
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = "detectron2://COCO-Detection/faster_rcnn_R_50_FPN_3x/137849458/model_final_280758.pkl"  # initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.00025
cfg.SOLVER.MAX_ITER = 100    # 300 iterations seems good enough, but you can certainly train longer
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 64   # faster, and good enough for this toy dataset
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 21

os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg) 
trainer.resume_or_load(resume=False)
trainer.train()

The full error is like this:

WARNING [12/16 08:47:28 d2.config.compat]: Config './detectron2_repo/configs/COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml' has no VERSION. Assuming it to be compatible with latest v2.
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-332-6eeee973ba83> in <module>()
     22 
     23 os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
---> 24 trainer = DefaultTrainer(cfg)
     25 trainer.resume_or_load(resume=False)
     26 trainer.train()

3 frames
/content/detectron2_repo/detectron2/modeling/meta_arch/rcnn.py in __init__(self, cfg)
     40         assert len(cfg.MODEL.PIXEL_MEAN) == len(cfg.MODEL.PIXEL_STD)
     41         num_channels = len(cfg.MODEL.PIXEL_MEAN)
---> 42         pixel_mean = torch.Tensor(cfg.MODEL.PIXEL_MEAN).to(self.device).view(num_channels, 1, 1)
     43         pixel_std = torch.Tensor(cfg.MODEL.PIXEL_STD).to(self.device).view(num_channels, 1, 1)
     44         self.normalizer = lambda x: (x - pixel_mean) / pixel_std

RuntimeError: CUDA error: device-side assert triggered

I don’t know why it won’t train. I’ve seen that it might be an issue with the number of classes declared, but I’ve tried many numbers, like 19-21 in “cfg.MODEL.ROI_HEADS.NUM_CLASSES”.

Environment

I am running on Google Colab

------------------------  --------------------------------------------------
sys.platform              linux
Python                    3.6.9 (default, Nov  7 2019, 10:44:02) [GCC 8.3.0]
Numpy                     1.17.4
Detectron2 Compiler       GCC 7.4
Detectron2 CUDA Compiler  10.0
DETECTRON2_ENV_MODULE     <not set>
PyTorch                   1.3.1
PyTorch Debug Build       False
torchvision               0.4.2
CUDA available            True
GPU 0                     Tesla K80
CUDA_HOME                 /usr/local/cuda
NVCC                      Cuda compilation tools, release 10.0, V10.0.130
Pillow                    6.2.1
cv2                       4.1.2
------------------------  --------------------------------------------------
PyTorch built with:
  - GCC 7.3
  - Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v0.20.5 (Git Hash 0125f28c61c1f822fd48570b4c1066f96fcb9b2e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CUDA Runtime 10.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  - CuDNN 7.6.3
  - Magma 2.5.1
  - Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

Thank you for your time.

Issue Analytics

State:
Created 4 years ago
Comments:13 (4 by maintainers)

Top GitHub Comments

11reactions

wangg12commented, Jan 22, 2020

For me, it was because the wrong NUM_CLASSES in my config for a new dataset.

3reactions

morganaribeirocommented, Mar 4, 2021

@wangg12 How did you identify the number of classes for this in code form? Would you help me. Because I did my labeling on Labelme, some labels are repeated.

Here the code I applied to show the labels:

classes = MetadataCatalog.get("fishesSegmentation_train").thing_classes
print("Classes:", classes)

Output:

Classes: ['anal fin Atlantic  Mackerel', 'anal fin Atlantic Mackerel', 'anal fin Lane Snapper', 'anal fin Mutton Snapper', 'black spot Lane Snapper', 'black spot Mutton Snapper', 'body', 'body Atlantic Mackerel', 'body Lane Snapper', 'body Mutton Snapper', 'caudal Lane Snapper', 'caudal fin Atlantic Mackerel', 'caudal fin Lane Snapper', 'caudal fin Mutton Snapper', 'dorsal', 'dorsal fin Atlantic Mackerel', 'dorsal fin Lane Snapper', 'dorsal fin Mutton Snapper', 'eye', 'eye Atlantic Mackerel', 'eye Lane Snapper', 'eye Mutton Snapper', 'mouth', 'mouth Lane Snapper', 'mouth Mutton Snapper', 'pectoral Mutton Snapper', 'pectoral fin Atlantic Mackerel', 'pectoral fin Lane Snapper', 'pectoral fin Mutton Snapper', 'pelvic', 'pelvic fin Atlantic Mackerel', 'pelvic fin Lane Snapper', 'pelvic fin Mutton Snapper', 'snout', 'snout Atlantic Mackerel', 'snout Lane Snapper', 'snout Mutton Snapper', 'spines Atlantic Mackerel']

I applied a len() to count the labels placed on Labelme. I did it like this: print ("Classes:", len(classes)) - Output: 38

Doubt: Do the repeated ones I count as being unique?

I tested the snippet with: cfg.MODEL.ROI_HEADS.NUM_CLASSES = 38, but keep the error: CUDA error: device-side assert triggered!

Top Results From Across the Web

CUDA runtime error (59) : device-side assert triggered

One way to raise the "CUDA error: device-side assert triggered" RuntimeError , is by indexing into a GPU torch.Tensor using a list having ......

How to fix “CUDA error: device-side assert triggered” error?

CUDA operations are executed asynchronously, so the stack trace might point to the wrong line of code. Rerun your script via ...

RuntimeError: CUDA error: device-side assert triggered · Issue ...

When I try running tutorial 2 on Colab I run into this error message: RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors...

[HELP] RuntimeError: CUDA error: device-side assert triggered

I get this error: RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the ......

How to fix 'Cuda error: Device-side assert triggered?

You should first check to see if the number of classes you've assigned to your dataset matches the number of output units you...