question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

cudaCheckError() failed : an illegal memory access was encountered

See original GitHub issue

Hi, thanks for your code! I use your code for training and it succeed, however, when it comes to testing, I am encountered with a weird error: CUDA_VISIBLE_DEVICES=0,1,2,3 python test_net.py exp_name --cascade --cuda --mGPUs “TiTanX” 09:48 09-9月-1/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/scipy/sparse/lil.py:16: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88 from . import _csparsetools /home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/scipy/sparse/csgraph/init.py:167: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88 from ._shortest_path import shortest_path, floyd_warshall, dijkstra,
/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/scipy/sparse/csgraph/_validation.py:5: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88 from ._tools import csgraph_to_dense, csgraph_from_dense,
/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/scipy/sparse/csgraph/init.py:169: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88 from ._traversal import breadth_first_order, depth_first_order,
/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/scipy/sparse/csgraph/init.py:171: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88 from ._min_spanning_tree import minimum_spanning_tree /home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/scipy/sparse/csgraph/init.py:172: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88 from ._reordering import reverse_cuthill_mckee, maximum_bipartite_matching,
/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/scipy/linalg/basic.py:17: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88 from ._solve_toeplitz import levinson /home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/scipy/linalg/init.py:191: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88 from ._decomp_update import * /home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/scipy/special/init.py:640: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88 from ._ufuncs import * /home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/scipy/special/_ellip_harm.py:7: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88 from ._ellip_harm_2 import _ellipsoid, _ellipsoid_norm /home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/scipy/optimize/_numdiff.py:8: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88 from ._group_columns import group_dense, group_sparse /home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/scipy/interpolate/_bsplines.py:9: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88 from . import _bspl /home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/scipy/spatial/init.py:94: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88 from .ckdtree import * /home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/scipy/spatial/init.py:95: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88 from .qhull import * /home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/scipy/spatial/_spherical_voronoi.py:18: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88 from . import _voronoi /home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/scipy/spatial/distance.py:121: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88 from . import _hausdorff /home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/scipy/io/matlab/mio4.py:18: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88 from .mio_utils import squeeze_element, chars_to_strings /home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/scipy/io/matlab/mio5.py:98: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88 from .mio5_utils import VarReader5 Called with args: Namespace(batch_size=1, cascade=True, cfg_file=‘cfgs/res101.yml’, checkepoch=7, checkpoint=6310, checksession=1, class_agnostic=False, cuda=True, dataset=‘pascal_voc’, exp_name=‘exp_name’, large_scale=False, load_dir=‘models’, mGPUs=True, net=‘detnet59’, parallel_type=0, set_cfgs=None, soft_nms=False, vis=False) Using config: {‘ANCHOR_RATIOS’: [0.5, 1, 2], ‘ANCHOR_SCALES’: [4, 8, 16, 32], ‘CROP_RESIZE_WITH_MAX_POOL’: False, ‘CUDA’: False, ‘DATA_DIR’: ‘/DATACENTER2/qyj/cascade-rcnn_Pytorch-master/data’, ‘DEDUP_BOXES’: 0.0625, ‘DETNET’: {‘FIXED_BLOCKS’: 1, ‘MAX_POOL’: False}, ‘EPS’: 1e-14, ‘EXP_DIR’: ‘res101’, ‘FEAT_STRIDE’: [16], ‘FPN_ANCHOR_SCALES’: [32, 64, 128, 256, 512], ‘FPN_ANCHOR_STRIDE’: 1, ‘FPN_FEAT_STRIDES’: [4, 8, 16, 16, 16], ‘GPU_ID’: 0, ‘HAS_MASK’: True, ‘MATLAB’: ‘matlab’, ‘MAX_NUM_GT_BOXES’: 20, ‘MOBILENET’: {‘DEPTH_MULTIPLIER’: 1.0, ‘FIXED_LAYERS’: 5, ‘REGU_DEPTH’: False, ‘WEIGHT_DECAY’: 4e-05}, ‘PIXEL_MEANS’: array([[[0.485, 0.456, 0.406]]]), ‘PIXEL_STDS’: array([[[0.229, 0.224, 0.225]]]), ‘POOLING_MODE’: ‘align’, ‘POOLING_SIZE’: 14, ‘RESNET’: {‘FIXED_BLOCKS’: 1, ‘MAX_POOL’: False}, ‘RNG_SEED’: 3, ‘ROOT_DIR’: ‘/DATACENTER2/qyj/cascade-rcnn_Pytorch-master’, ‘TEST’: {‘BBOX_REG’: True, ‘HAS_RPN’: True, ‘MAX_SIZE’: 1000, ‘MODE’: ‘nms’, ‘NMS’: 0.3, ‘PROPOSAL_METHOD’: ‘gt’, ‘RPN_MIN_SIZE’: 16, ‘RPN_NMS_THRESH’: 0.7, ‘RPN_POST_NMS_TOP_N’: 300, ‘RPN_PRE_NMS_TOP_N’: 6000, ‘RPN_TOP_N’: 5000, ‘SCALES’: [600], ‘SOFT_NMS_METHOD’: 1, ‘SVM’: False}, ‘TRAIN’: {‘ASPECT_CROPPING’: False, ‘ASPECT_GROUPING’: False, ‘BATCH_SIZE’: 128, ‘BBOX_INSIDE_WEIGHTS’: [1.0, 1.0, 1.0, 1.0], ‘BBOX_NORMALIZE_MEANS’: [0.0, 0.0, 0.0, 0.0], ‘BBOX_NORMALIZE_STDS’: [0.1, 0.1, 0.2, 0.2], ‘BBOX_NORMALIZE_TARGETS’: True, ‘BBOX_NORMALIZE_TARGETS_PRECOMPUTED’: True, ‘BBOX_REG’: True, ‘BBOX_THRESH’: 0.5, ‘BG_THRESH_HI’: 0.5, ‘BG_THRESH_LO’: 0.0, ‘BIAS_DECAY’: False, ‘BN_TRAIN’: False, ‘DISPLAY’: 20, ‘DOUBLE_BIAS’: False, ‘FG_FRACTION’: 0.25, ‘FG_THRESH’: 0.5, ‘FG_THRESH_2ND’: 0.6, ‘FG_THRESH_3RD’: 0.7, ‘GAMMA’: 0.1, ‘HAS_RPN’: True, ‘IMS_PER_BATCH’: 1, ‘LEARNING_RATE’: 0.001, ‘MAX_SIZE’: 1000, ‘MOMENTUM’: 0.9, ‘PROPOSAL_METHOD’: ‘gt’, ‘RPN_BATCHSIZE’: 256, ‘RPN_BBOX_INSIDE_WEIGHTS’: [1.0, 1.0, 1.0, 1.0], ‘RPN_CLOBBER_POSITIVES’: False, ‘RPN_FG_FRACTION’: 0.5, ‘RPN_MIN_SIZE’: 8, ‘RPN_NEGATIVE_OVERLAP’: 0.3, ‘RPN_NMS_THRESH’: 0.7, ‘RPN_POSITIVE_OVERLAP’: 0.7, ‘RPN_POSITIVE_WEIGHT’: -1.0, ‘RPN_POST_NMS_TOP_N’: 2000, ‘RPN_PRE_NMS_TOP_N’: 12000, ‘SCALES’: [600], ‘SNAPSHOT_ITERS’: 5000, ‘SNAPSHOT_KEPT’: 3, ‘SNAPSHOT_PREFIX’: ‘res101_faster_rcnn’, ‘STEPSIZE’: [30000], ‘SUMMARY_INTERVAL’: 180, ‘TRIM_HEIGHT’: 600, ‘TRIM_WIDTH’: 600, ‘TRUNCATED’: False, ‘USE_ALL_GT’: True, ‘USE_FLIPPED’: True, ‘USE_GT’: False, ‘WEIGHT_DECAY’: 0.0001}, ‘USE_GPU_NMS’: True} Loaded dataset voc_2007_test for training Set proposal method: gt Preparing training data… voc_2007_test gt roidb loaded from /DATACENTER2/qyj/cascade-rcnn_Pytorch-master/data/cache/voc_2007_test_gt_roidb.pkl done 3462 roidb entries load checkpoint models/detnet59/pascal_voc/exp_name/fpn_1_7_6310.pth load model successfully! cudaCheckError() failed : an illegal memory access was encountered

And that’s the report after using os.environ[‘CUDA_LAUNCH_BLOCKING’] = ‘1’ to locate the real place which triggered the cudaCheckError() Without using it, the error is: 3462 roidb entries load checkpoint models/detnet59/pascal_voc/exp_name/fpn_1_7_6310.pth load model successfully! THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1513363039688/work/torch/lib/THC/generated/…/THCReduceAll.cuh line=339 error=77 : an illegal memory access was encountered Traceback (most recent call last): File “test_net.py”, line 246, in <module> ret = fpn(im_data, im_info, gt_boxes, num_boxes) File “/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py”, line 325, in call result = self.forward(*input, **kwargs) File “/DATACENTER2/qyj/cascade-rcnn_Pytorch-master/lib/model/fpn/cascade/fpn.py”, line 316, in forward roi_pool_feat = self._PyramidRoI_Feat(mrcnn_feature_maps, rois, im_info) File “/DATACENTER2/qyj/cascade-rcnn_Pytorch-master/lib/model/fpn/cascade/fpn.py”, line 135, in _PyramidRoI_Feat if (roi_level == l).sum() == 0: RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /opt/conda/conda-bld/pytorch_1513363039688/work/torch/lib/THC/generated/…/THCReduceAll.cuh:339

Issue Analytics

  • State:open
  • Created 5 years ago
  • Reactions:7
  • Comments:12

github_iconTop GitHub Comments

6reactions
hcx1231commented, Sep 13, 2018

i also have the same problem

1reaction
huihuiustccommented, May 26, 2019

It happens when testing

Read more comments on GitHub >

github_iconTop Results From Across the Web

RuntimeError: CUDA error: an illegal memory access was ...
Hi,everyone! I met a strange illegal memory access error. It happens randomly without any regular pattern. The code is really simple.
Read more >
PyTorch CUDA error: an illegal memory access was ...
It was partially said by the answer of the OP, but the problem under the hood with illegal memory access is that the...
Read more >
CUDA error: an illegal memory access was encountered with ...
Try to use the latest PyTorch (1.10). The error indicates an out of bound memory access similar to a segfault on the CPU,...
Read more >
CUDA error 700 - an illegal memory access was encountered
This could be due to you're running out of memory or accessing an illegal address via a pointer. Following similar issue may help...
Read more >
CUDA error: an illegal memory access was encountered - Part ...
When I am running following code on Gradient, it is working fine but it is throwing me error after running for few seconds...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found