cudaCheckError() failed : an illegal memory access was encountered
See original GitHub issueHi, thanks for your code!
I use your code for training and it succeed, however, when it comes to testing, I am encountered with a weird error:
CUDA_VISIBLE_DEVICES=0,1,2,3 python test_net.py exp_name --cascade --cuda --mGPUs “TiTanX” 09:48 09-9月-1/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/scipy/sparse/lil.py:16: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from . import _csparsetools
/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/scipy/sparse/csgraph/init.py:167: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from ._shortest_path import shortest_path, floyd_warshall, dijkstra,
/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/scipy/sparse/csgraph/_validation.py:5: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from ._tools import csgraph_to_dense, csgraph_from_dense,
/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/scipy/sparse/csgraph/init.py:169: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from ._traversal import breadth_first_order, depth_first_order,
/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/scipy/sparse/csgraph/init.py:171: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from ._min_spanning_tree import minimum_spanning_tree
/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/scipy/sparse/csgraph/init.py:172: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from ._reordering import reverse_cuthill_mckee, maximum_bipartite_matching,
/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/scipy/linalg/basic.py:17: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from ._solve_toeplitz import levinson
/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/scipy/linalg/init.py:191: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from ._decomp_update import *
/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/scipy/special/init.py:640: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from ._ufuncs import *
/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/scipy/special/_ellip_harm.py:7: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from ._ellip_harm_2 import _ellipsoid, _ellipsoid_norm
/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/scipy/optimize/_numdiff.py:8: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from ._group_columns import group_dense, group_sparse
/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/scipy/interpolate/_bsplines.py:9: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from . import _bspl
/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/scipy/spatial/init.py:94: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from .ckdtree import *
/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/scipy/spatial/init.py:95: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from .qhull import *
/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/scipy/spatial/_spherical_voronoi.py:18: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from . import _voronoi
/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/scipy/spatial/distance.py:121: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from . import _hausdorff
/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/scipy/io/matlab/mio4.py:18: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from .mio_utils import squeeze_element, chars_to_strings
/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/scipy/io/matlab/mio5.py:98: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from .mio5_utils import VarReader5
Called with args:
Namespace(batch_size=1, cascade=True, cfg_file=‘cfgs/res101.yml’, checkepoch=7, checkpoint=6310, checksession=1, class_agnostic=False, cuda=True, dataset=‘pascal_voc’, exp_name=‘exp_name’, large_scale=False, load_dir=‘models’, mGPUs=True, net=‘detnet59’, parallel_type=0, set_cfgs=None, soft_nms=False, vis=False)
Using config:
{‘ANCHOR_RATIOS’: [0.5, 1, 2],
‘ANCHOR_SCALES’: [4, 8, 16, 32],
‘CROP_RESIZE_WITH_MAX_POOL’: False,
‘CUDA’: False,
‘DATA_DIR’: ‘/DATACENTER2/qyj/cascade-rcnn_Pytorch-master/data’,
‘DEDUP_BOXES’: 0.0625,
‘DETNET’: {‘FIXED_BLOCKS’: 1, ‘MAX_POOL’: False},
‘EPS’: 1e-14,
‘EXP_DIR’: ‘res101’,
‘FEAT_STRIDE’: [16],
‘FPN_ANCHOR_SCALES’: [32, 64, 128, 256, 512],
‘FPN_ANCHOR_STRIDE’: 1,
‘FPN_FEAT_STRIDES’: [4, 8, 16, 16, 16],
‘GPU_ID’: 0,
‘HAS_MASK’: True,
‘MATLAB’: ‘matlab’,
‘MAX_NUM_GT_BOXES’: 20,
‘MOBILENET’: {‘DEPTH_MULTIPLIER’: 1.0,
‘FIXED_LAYERS’: 5,
‘REGU_DEPTH’: False,
‘WEIGHT_DECAY’: 4e-05},
‘PIXEL_MEANS’: array([[[0.485, 0.456, 0.406]]]),
‘PIXEL_STDS’: array([[[0.229, 0.224, 0.225]]]),
‘POOLING_MODE’: ‘align’,
‘POOLING_SIZE’: 14,
‘RESNET’: {‘FIXED_BLOCKS’: 1, ‘MAX_POOL’: False},
‘RNG_SEED’: 3,
‘ROOT_DIR’: ‘/DATACENTER2/qyj/cascade-rcnn_Pytorch-master’,
‘TEST’: {‘BBOX_REG’: True,
‘HAS_RPN’: True,
‘MAX_SIZE’: 1000,
‘MODE’: ‘nms’,
‘NMS’: 0.3,
‘PROPOSAL_METHOD’: ‘gt’,
‘RPN_MIN_SIZE’: 16,
‘RPN_NMS_THRESH’: 0.7,
‘RPN_POST_NMS_TOP_N’: 300,
‘RPN_PRE_NMS_TOP_N’: 6000,
‘RPN_TOP_N’: 5000,
‘SCALES’: [600],
‘SOFT_NMS_METHOD’: 1,
‘SVM’: False},
‘TRAIN’: {‘ASPECT_CROPPING’: False,
‘ASPECT_GROUPING’: False,
‘BATCH_SIZE’: 128,
‘BBOX_INSIDE_WEIGHTS’: [1.0, 1.0, 1.0, 1.0],
‘BBOX_NORMALIZE_MEANS’: [0.0, 0.0, 0.0, 0.0],
‘BBOX_NORMALIZE_STDS’: [0.1, 0.1, 0.2, 0.2],
‘BBOX_NORMALIZE_TARGETS’: True,
‘BBOX_NORMALIZE_TARGETS_PRECOMPUTED’: True,
‘BBOX_REG’: True,
‘BBOX_THRESH’: 0.5,
‘BG_THRESH_HI’: 0.5,
‘BG_THRESH_LO’: 0.0,
‘BIAS_DECAY’: False,
‘BN_TRAIN’: False,
‘DISPLAY’: 20,
‘DOUBLE_BIAS’: False,
‘FG_FRACTION’: 0.25,
‘FG_THRESH’: 0.5,
‘FG_THRESH_2ND’: 0.6,
‘FG_THRESH_3RD’: 0.7,
‘GAMMA’: 0.1,
‘HAS_RPN’: True,
‘IMS_PER_BATCH’: 1,
‘LEARNING_RATE’: 0.001,
‘MAX_SIZE’: 1000,
‘MOMENTUM’: 0.9,
‘PROPOSAL_METHOD’: ‘gt’,
‘RPN_BATCHSIZE’: 256,
‘RPN_BBOX_INSIDE_WEIGHTS’: [1.0, 1.0, 1.0, 1.0],
‘RPN_CLOBBER_POSITIVES’: False,
‘RPN_FG_FRACTION’: 0.5,
‘RPN_MIN_SIZE’: 8,
‘RPN_NEGATIVE_OVERLAP’: 0.3,
‘RPN_NMS_THRESH’: 0.7,
‘RPN_POSITIVE_OVERLAP’: 0.7,
‘RPN_POSITIVE_WEIGHT’: -1.0,
‘RPN_POST_NMS_TOP_N’: 2000,
‘RPN_PRE_NMS_TOP_N’: 12000,
‘SCALES’: [600],
‘SNAPSHOT_ITERS’: 5000,
‘SNAPSHOT_KEPT’: 3,
‘SNAPSHOT_PREFIX’: ‘res101_faster_rcnn’,
‘STEPSIZE’: [30000],
‘SUMMARY_INTERVAL’: 180,
‘TRIM_HEIGHT’: 600,
‘TRIM_WIDTH’: 600,
‘TRUNCATED’: False,
‘USE_ALL_GT’: True,
‘USE_FLIPPED’: True,
‘USE_GT’: False,
‘WEIGHT_DECAY’: 0.0001},
‘USE_GPU_NMS’: True}
Loaded dataset voc_2007_test
for training
Set proposal method: gt
Preparing training data…
voc_2007_test gt roidb loaded from /DATACENTER2/qyj/cascade-rcnn_Pytorch-master/data/cache/voc_2007_test_gt_roidb.pkl
done
3462 roidb entries
load checkpoint models/detnet59/pascal_voc/exp_name/fpn_1_7_6310.pth
load model successfully!
cudaCheckError() failed : an illegal memory access was encountered
And that’s the report after using os.environ[‘CUDA_LAUNCH_BLOCKING’] = ‘1’ to locate the real place which triggered the cudaCheckError() Without using it, the error is: 3462 roidb entries load checkpoint models/detnet59/pascal_voc/exp_name/fpn_1_7_6310.pth load model successfully! THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1513363039688/work/torch/lib/THC/generated/…/THCReduceAll.cuh line=339 error=77 : an illegal memory access was encountered Traceback (most recent call last): File “test_net.py”, line 246, in <module> ret = fpn(im_data, im_info, gt_boxes, num_boxes) File “/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py”, line 325, in call result = self.forward(*input, **kwargs) File “/DATACENTER2/qyj/cascade-rcnn_Pytorch-master/lib/model/fpn/cascade/fpn.py”, line 316, in forward roi_pool_feat = self._PyramidRoI_Feat(mrcnn_feature_maps, rois, im_info) File “/DATACENTER2/qyj/cascade-rcnn_Pytorch-master/lib/model/fpn/cascade/fpn.py”, line 135, in _PyramidRoI_Feat if (roi_level == l).sum() == 0: RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /opt/conda/conda-bld/pytorch_1513363039688/work/torch/lib/THC/generated/…/THCReduceAll.cuh:339
Issue Analytics
- State:
- Created 5 years ago
- Reactions:7
- Comments:12
i also have the same problem
It happens when testing