CUDA error: device-side assert triggered
See original GitHub issueInstructions To Reproduce the Issue
- I have this metadada from the dataset that I modified and loaded using
register_coco_instances("mis", {}, "./Set1/missouri_camera_traps_set1.json", "./")
It is in COCO camera traps dataset format, so I loaded that way. I got the following metadata class from “MetadataCatalog.get(“mis”)”:
Metadata(evaluator_type='coco', image_root='./', json_file='./Set1/missouri_camera_traps_set1.json', name='mis', thing_classes=['empty', 'agouti', 'collared_peccary', 'paca', 'red_brocket_deer', 'white-nosed_coati', 'spiny_rat', 'ocelot', 'red_squirrel', 'common_opossum', 'bird_spec', 'great_tinamou', 'white_tailed_deer', 'mouflon', 'red_deer', 'roe_deer', 'wild_boar', 'red_fox', 'european_hare', 'wood_mouse', 'coiban_agouti'], thing_dataset_id_to_contiguous_id={0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, 10: 10, 11: 11, 12: 12, 13: 13, 14: 14, 15: 15, 16: 16, 17: 17, 18: 18, 19: 19, 20: 20})
- When i try to run the following block, as in the balloon tutorial, I get the output written in the title of this post:
from detectron2.engine import DefaultTrainer
from detectron2.config import get_cfg
cfg = get_cfg()
cfg.merge_from_file("./detectron2_repo/configs/COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml")
cfg.DATASETS.TRAIN = ("mis",)
cfg.DATASETS.TEST = ()
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = "detectron2://COCO-Detection/faster_rcnn_R_50_FPN_3x/137849458/model_final_280758.pkl" # initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.00025
cfg.SOLVER.MAX_ITER = 100 # 300 iterations seems good enough, but you can certainly train longer
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 64 # faster, and good enough for this toy dataset
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 21
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()
- The full error is like this:
WARNING [12/16 08:47:28 d2.config.compat]: Config './detectron2_repo/configs/COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml' has no VERSION. Assuming it to be compatible with latest v2.
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-332-6eeee973ba83> in <module>()
22
23 os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
---> 24 trainer = DefaultTrainer(cfg)
25 trainer.resume_or_load(resume=False)
26 trainer.train()
3 frames
/content/detectron2_repo/detectron2/modeling/meta_arch/rcnn.py in __init__(self, cfg)
40 assert len(cfg.MODEL.PIXEL_MEAN) == len(cfg.MODEL.PIXEL_STD)
41 num_channels = len(cfg.MODEL.PIXEL_MEAN)
---> 42 pixel_mean = torch.Tensor(cfg.MODEL.PIXEL_MEAN).to(self.device).view(num_channels, 1, 1)
43 pixel_std = torch.Tensor(cfg.MODEL.PIXEL_STD).to(self.device).view(num_channels, 1, 1)
44 self.normalizer = lambda x: (x - pixel_mean) / pixel_std
RuntimeError: CUDA error: device-side assert triggered
- I don’t know why it won’t train. I’ve seen that it might be an issue with the number of classes declared, but I’ve tried many numbers, like 19-21 in “cfg.MODEL.ROI_HEADS.NUM_CLASSES”.
Environment
I am running on Google Colab
------------------------ --------------------------------------------------
sys.platform linux
Python 3.6.9 (default, Nov 7 2019, 10:44:02) [GCC 8.3.0]
Numpy 1.17.4
Detectron2 Compiler GCC 7.4
Detectron2 CUDA Compiler 10.0
DETECTRON2_ENV_MODULE <not set>
PyTorch 1.3.1
PyTorch Debug Build False
torchvision 0.4.2
CUDA available True
GPU 0 Tesla K80
CUDA_HOME /usr/local/cuda
NVCC Cuda compilation tools, release 10.0, V10.0.130
Pillow 6.2.1
cv2 4.1.2
------------------------ --------------------------------------------------
PyTorch built with:
- GCC 7.3
- Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v0.20.5 (Git Hash 0125f28c61c1f822fd48570b4c1066f96fcb9b2e)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CUDA Runtime 10.1
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
- CuDNN 7.6.3
- Magma 2.5.1
- Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,
Thank you for your time.
Issue Analytics
- State:
- Created 4 years ago
- Comments:13 (4 by maintainers)
Top Results From Across the Web
CUDA runtime error (59) : device-side assert triggered
One way to raise the "CUDA error: device-side assert triggered" RuntimeError , is by indexing into a GPU torch.Tensor using a list having ......
Read more >How to fix “CUDA error: device-side assert triggered” error?
CUDA operations are executed asynchronously, so the stack trace might point to the wrong line of code. Rerun your script via ...
Read more >RuntimeError: CUDA error: device-side assert triggered · Issue ...
When I try running tutorial 2 on Colab I run into this error message: RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors...
Read more >[HELP] RuntimeError: CUDA error: device-side assert triggered
I get this error: RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the ......
Read more >How to fix 'Cuda error: Device-side assert triggered?
You should first check to see if the number of classes you've assigned to your dataset matches the number of output units you...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
For me, it was because the wrong NUM_CLASSES in my config for a new dataset.
@wangg12 How did you identify the number of classes for this in code form? Would you help me. Because I did my labeling on Labelme, some labels are repeated.
print ("Classes:", len(classes))
- Output: 38Doubt: Do the repeated ones I count as being unique?
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 38
, but keep the error:CUDA error: device-side assert triggered
!