RuntimeError: copy_if failed to synchronize: device-side assert triggered
See original GitHub issue2020-03-22 16:34:58,582 INFO Start logging
2020-03-22 16:34:58,584 INFO CUDA_VISIBLE_DEVICES=ALL
2020-03-22 16:34:58,584 INFO cfg_file cfgs/pointpillar.yaml
2020-03-22 16:34:58,585 INFO data_dir None
2020-03-22 16:34:58,586 INFO batch_size 4
2020-03-22 16:34:58,586 INFO epochs 80
2020-03-22 16:34:58,587 INFO workers 4
2020-03-22 16:34:58,588 INFO extra_tag default
2020-03-22 16:34:58,588 INFO ckpt pointpillar.pth
2020-03-22 16:34:58,589 INFO mgpus False
2020-03-22 16:34:58,589 INFO launcher none
2020-03-22 16:34:58,590 INFO tcp_port 18888
2020-03-22 16:34:58,591 INFO local_rank 0
2020-03-22 16:34:58,592 INFO set_cfgs None
2020-03-22 16:34:58,592 INFO max_waiting_mins 30
2020-03-22 16:34:58,593 INFO start_epoch 0
2020-03-22 16:34:58,593 INFO eval_tag default
2020-03-22 16:34:58,594 INFO eval_all False
2020-03-22 16:34:58,595 INFO ckpt_dir None
2020-03-22 16:34:58,596 INFO save_to_file False
2020-03-22 16:34:58,596 INFO cfg.ROOT_DIR: /media/buaa/My Passport/PCDet
2020-03-22 16:34:58,597 INFO cfg.LOCAL_RANK: 0
2020-03-22 16:34:58,598 INFO cfg.CLASS_NAMES: [‘Car’, ‘Pedestrian’, ‘Cyclist’]
2020-03-22 16:34:58,599 INFO
cfg.DATA_CONFIG = edict()
2020-03-22 16:34:58,600 INFO cfg.DATA_CONFIG.DATASET: KittiDataset
2020-03-22 16:34:58,601 INFO cfg.DATA_CONFIG.DATA_DIR: data/kitti
2020-03-22 16:34:58,602 INFO cfg.DATA_CONFIG.FOV_POINTS_ONLY: True
2020-03-22 16:34:58,602 INFO
cfg.DATA_CONFIG.NUM_POINT_FEATURES = edict()
2020-03-22 16:34:58,603 INFO cfg.DATA_CONFIG.NUM_POINT_FEATURES.total: 4
2020-03-22 16:34:58,604 INFO cfg.DATA_CONFIG.NUM_POINT_FEATURES.use: 4
2020-03-22 16:34:58,605 INFO cfg.DATA_CONFIG.POINT_CLOUD_RANGE: [0, -39.68, -3, 69.12, 39.68, 1]
2020-03-22 16:34:58,605 INFO cfg.DATA_CONFIG.MASK_POINTS_BY_RANGE: True
2020-03-22 16:34:58,606 INFO
cfg.DATA_CONFIG.TRAIN = edict()
2020-03-22 16:34:58,606 INFO cfg.DATA_CONFIG.TRAIN.INFO_PATH: [‘data/kitti/kitti_infos_train.pkl’]
2020-03-22 16:34:58,607 INFO cfg.DATA_CONFIG.TRAIN.SHUFFLE_POINTS: True
2020-03-22 16:34:58,608 INFO cfg.DATA_CONFIG.TRAIN.MAX_NUMBER_OF_VOXELS: 16000
2020-03-22 16:34:58,608 INFO
cfg.DATA_CONFIG.TEST = edict()
2020-03-22 16:34:58,609 INFO cfg.DATA_CONFIG.TEST.INFO_PATH: [‘data/kitti/kitti_infos_val.pkl’]
2020-03-22 16:34:58,610 INFO cfg.DATA_CONFIG.TEST.SHUFFLE_POINTS: False
2020-03-22 16:34:58,610 INFO cfg.DATA_CONFIG.TEST.MAX_NUMBER_OF_VOXELS: 40000
2020-03-22 16:34:58,611 INFO
cfg.DATA_CONFIG.AUGMENTATION = edict()
2020-03-22 16:34:58,612 INFO
cfg.DATA_CONFIG.AUGMENTATION.NOISE_PER_OBJECT = edict()
2020-03-22 16:34:58,613 INFO cfg.DATA_CONFIG.AUGMENTATION.NOISE_PER_OBJECT.ENABLED: True
2020-03-22 16:34:58,613 INFO cfg.DATA_CONFIG.AUGMENTATION.NOISE_PER_OBJECT.GT_LOC_NOISE_STD: [1.0, 1.0, 0.1]
2020-03-22 16:34:58,614 INFO cfg.DATA_CONFIG.AUGMENTATION.NOISE_PER_OBJECT.GT_ROT_UNIFORM_NOISE: [-0.78539816, 0.78539816]
2020-03-22 16:34:58,615 INFO
cfg.DATA_CONFIG.AUGMENTATION.NOISE_GLOBAL_SCENE = edict()
2020-03-22 16:34:58,615 INFO cfg.DATA_CONFIG.AUGMENTATION.NOISE_GLOBAL_SCENE.ENABLED: True
2020-03-22 16:34:58,616 INFO cfg.DATA_CONFIG.AUGMENTATION.NOISE_GLOBAL_SCENE.GLOBAL_ROT_UNIFORM_NOISE: [-0.78539816, 0.78539816]
2020-03-22 16:34:58,616 INFO cfg.DATA_CONFIG.AUGMENTATION.NOISE_GLOBAL_SCENE.GLOBAL_SCALING_UNIFORM_NOISE: [0.95, 1.05]
2020-03-22 16:34:58,617 INFO
cfg.DATA_CONFIG.AUGMENTATION.DB_SAMPLER = edict()
2020-03-22 16:34:58,618 INFO cfg.DATA_CONFIG.AUGMENTATION.DB_SAMPLER.ENABLED: True
2020-03-22 16:34:58,619 INFO cfg.DATA_CONFIG.AUGMENTATION.DB_SAMPLER.DB_INFO_PATH: [‘data/kitti/kitti_dbinfos_train.pkl’]
2020-03-22 16:34:58,620 INFO
cfg.DATA_CONFIG.AUGMENTATION.DB_SAMPLER.PREPARE = edict()
2020-03-22 16:34:58,621 INFO cfg.DATA_CONFIG.AUGMENTATION.DB_SAMPLER.PREPARE.filter_by_difficulty: [-1]
2020-03-22 16:34:58,621 INFO cfg.DATA_CONFIG.AUGMENTATION.DB_SAMPLER.PREPARE.filter_by_min_points: [‘Car:5’, ‘Pedestrian:5’, ‘Cyclist:5’]
2020-03-22 16:34:58,622 INFO cfg.DATA_CONFIG.AUGMENTATION.DB_SAMPLER.RATE: 1.0
2020-03-22 16:34:58,623 INFO cfg.DATA_CONFIG.AUGMENTATION.DB_SAMPLER.SAMPLE_GROUPS: [‘Car:15’, ‘Pedestrian:10’, ‘Cyclist:10’]
2020-03-22 16:34:58,624 INFO cfg.DATA_CONFIG.AUGMENTATION.DB_SAMPLER.USE_ROAD_PLANE: True
2020-03-22 16:34:58,624 INFO
cfg.DATA_CONFIG.VOXEL_GENERATOR = edict()
2020-03-22 16:34:58,625 INFO cfg.DATA_CONFIG.VOXEL_GENERATOR.MAX_POINTS_PER_VOXEL: 32
2020-03-22 16:34:58,625 INFO cfg.DATA_CONFIG.VOXEL_GENERATOR.VOXEL_SIZE: [0.16, 0.16, 4]
2020-03-22 16:34:58,626 INFO
cfg.MODEL = edict()
2020-03-22 16:34:58,627 INFO cfg.MODEL.NAME: PointPillar
2020-03-22 16:34:58,627 INFO
cfg.MODEL.VFE = edict()
2020-03-22 16:34:58,628 INFO cfg.MODEL.VFE.NAME: PillarFeatureNetOld2
2020-03-22 16:34:58,628 INFO
cfg.MODEL.VFE.ARGS = edict()
2020-03-22 16:34:58,629 INFO cfg.MODEL.VFE.ARGS.use_norm: True
2020-03-22 16:34:58,630 INFO cfg.MODEL.VFE.ARGS.num_filters: [64]
2020-03-22 16:34:58,630 INFO cfg.MODEL.VFE.ARGS.with_distance: False
2020-03-22 16:34:58,631 INFO
cfg.MODEL.RPN = edict()
2020-03-22 16:34:58,631 INFO cfg.MODEL.RPN.PARAMS_FIXED: False
2020-03-22 16:34:58,632 INFO
cfg.MODEL.RPN.BACKBONE = edict()
2020-03-22 16:34:58,632 INFO cfg.MODEL.RPN.BACKBONE.NAME: PointPillarsScatter
2020-03-22 16:34:58,633 INFO
cfg.MODEL.RPN.BACKBONE.ARGS = edict()
2020-03-22 16:34:58,633 INFO
cfg.MODEL.RPN.RPN_HEAD = edict()
2020-03-22 16:34:58,634 INFO cfg.MODEL.RPN.RPN_HEAD.NAME: RPNV2
2020-03-22 16:34:58,634 INFO cfg.MODEL.RPN.RPN_HEAD.DOWNSAMPLE_FACTOR: 8
2020-03-22 16:34:58,635 INFO
cfg.MODEL.RPN.RPN_HEAD.ARGS = edict()
2020-03-22 16:34:58,635 INFO cfg.MODEL.RPN.RPN_HEAD.ARGS.use_norm: True
2020-03-22 16:34:58,636 INFO cfg.MODEL.RPN.RPN_HEAD.ARGS.concat_input: False
2020-03-22 16:34:58,636 INFO cfg.MODEL.RPN.RPN_HEAD.ARGS.num_input_features: 64
2020-03-22 16:34:58,637 INFO cfg.MODEL.RPN.RPN_HEAD.ARGS.layer_nums: [3, 5, 5]
2020-03-22 16:34:58,637 INFO cfg.MODEL.RPN.RPN_HEAD.ARGS.layer_strides: [2, 2, 2]
2020-03-22 16:34:58,638 INFO cfg.MODEL.RPN.RPN_HEAD.ARGS.num_filters: [64, 128, 256]
2020-03-22 16:34:58,638 INFO cfg.MODEL.RPN.RPN_HEAD.ARGS.upsample_strides: [1, 2, 4]
2020-03-22 16:34:58,639 INFO cfg.MODEL.RPN.RPN_HEAD.ARGS.num_upsample_filters: [128, 128, 128]
2020-03-22 16:34:58,639 INFO cfg.MODEL.RPN.RPN_HEAD.ARGS.encode_background_as_zeros: True
2020-03-22 16:34:58,640 INFO cfg.MODEL.RPN.RPN_HEAD.ARGS.use_direction_classifier: True
2020-03-22 16:34:58,640 INFO cfg.MODEL.RPN.RPN_HEAD.ARGS.num_direction_bins: 2
2020-03-22 16:34:58,641 INFO cfg.MODEL.RPN.RPN_HEAD.ARGS.dir_offset: 0.78539
2020-03-22 16:34:58,641 INFO cfg.MODEL.RPN.RPN_HEAD.ARGS.dir_limit_offset: 0.0
2020-03-22 16:34:58,642 INFO cfg.MODEL.RPN.RPN_HEAD.ARGS.use_binary_dir_classifier: False
2020-03-22 16:34:58,643 INFO
cfg.MODEL.RPN.RPN_HEAD.TARGET_CONFIG = edict()
2020-03-22 16:34:58,643 INFO cfg.MODEL.RPN.RPN_HEAD.TARGET_CONFIG.DOWNSAMPLED_FACTOR: 2
2020-03-22 16:34:58,644 INFO cfg.MODEL.RPN.RPN_HEAD.TARGET_CONFIG.BOX_CODER: ResidualCoder
2020-03-22 16:34:58,644 INFO cfg.MODEL.RPN.RPN_HEAD.TARGET_CONFIG.REGION_SIMILARITY_FN: nearest_iou_similarity
2020-03-22 16:34:58,645 INFO cfg.MODEL.RPN.RPN_HEAD.TARGET_CONFIG.SAMPLE_POS_FRACTION: -1.0
2020-03-22 16:34:58,645 INFO cfg.MODEL.RPN.RPN_HEAD.TARGET_CONFIG.SAMPLE_SIZE: 512
2020-03-22 16:34:58,646 INFO cfg.MODEL.RPN.RPN_HEAD.TARGET_CONFIG.ANCHOR_GENERATOR: [{‘anchor_range’: [0, -40.0, -1.78, 70.4, 40.0, -1.78], ‘sizes’: [[1.6, 3.9, 1.56]], ‘rotations’: [0, 1.57], ‘matched_threshold’: 0.6, ‘unmatched_threshold’: 0.45, ‘class_name’: ‘Car’}, {‘anchor_range’: [0, -40, -0.6, 70.4, 40, -0.6], ‘sizes’: [[0.6, 0.8, 1.73]], ‘rotations’: [0, 1.57], ‘matched_threshold’: 0.5, ‘unmatched_threshold’: 0.35, ‘class_name’: ‘Pedestrian’}, {‘anchor_range’: [0, -40, -0.6, 70.4, 40, -0.6], ‘sizes’: [[0.6, 1.76, 1.73]], ‘rotations’: [0, 1.57], ‘matched_threshold’: 0.5, ‘unmatched_threshold’: 0.35, ‘class_name’: ‘Cyclist’}]
2020-03-22 16:34:58,646 INFO
cfg.MODEL.RCNN = edict()
2020-03-22 16:34:58,647 INFO cfg.MODEL.RCNN.ENABLED: False
2020-03-22 16:34:58,648 INFO
cfg.MODEL.LOSSES = edict()
2020-03-22 16:34:58,648 INFO cfg.MODEL.LOSSES.RPN_REG_LOSS: smooth-l1
2020-03-22 16:34:58,649 INFO
cfg.MODEL.LOSSES.LOSS_WEIGHTS = edict()
2020-03-22 16:34:58,650 INFO cfg.MODEL.LOSSES.LOSS_WEIGHTS.rpn_cls_weight: 1.0
2020-03-22 16:34:58,651 INFO cfg.MODEL.LOSSES.LOSS_WEIGHTS.rpn_loc_weight: 2.0
2020-03-22 16:34:58,652 INFO cfg.MODEL.LOSSES.LOSS_WEIGHTS.rpn_dir_weight: 0.2
2020-03-22 16:34:58,652 INFO cfg.MODEL.LOSSES.LOSS_WEIGHTS.code_weights: [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
2020-03-22 16:34:58,653 INFO
cfg.MODEL.TRAIN = edict()
2020-03-22 16:34:58,653 INFO cfg.MODEL.TRAIN.SPLIT: train
2020-03-22 16:34:58,654 INFO
cfg.MODEL.TRAIN.OPTIMIZATION = edict()
2020-03-22 16:34:58,655 INFO cfg.MODEL.TRAIN.OPTIMIZATION.OPTIMIZER: adam_onecycle
2020-03-22 16:34:58,655 INFO cfg.MODEL.TRAIN.OPTIMIZATION.LR: 0.003
2020-03-22 16:34:58,656 INFO cfg.MODEL.TRAIN.OPTIMIZATION.WEIGHT_DECAY: 0.01
2020-03-22 16:34:58,656 INFO cfg.MODEL.TRAIN.OPTIMIZATION.MOMENTUM: 0.9
2020-03-22 16:34:58,657 INFO cfg.MODEL.TRAIN.OPTIMIZATION.MOMS: [0.95, 0.85]
2020-03-22 16:34:58,658 INFO cfg.MODEL.TRAIN.OPTIMIZATION.PCT_START: 0.4
2020-03-22 16:34:58,659 INFO cfg.MODEL.TRAIN.OPTIMIZATION.DIV_FACTOR: 10
2020-03-22 16:34:58,659 INFO cfg.MODEL.TRAIN.OPTIMIZATION.DECAY_STEP_LIST: [35, 45]
2020-03-22 16:34:58,660 INFO cfg.MODEL.TRAIN.OPTIMIZATION.LR_DECAY: 0.1
2020-03-22 16:34:58,661 INFO cfg.MODEL.TRAIN.OPTIMIZATION.LR_CLIP: 1e-07
2020-03-22 16:34:58,661 INFO cfg.MODEL.TRAIN.OPTIMIZATION.LR_WARMUP: False
2020-03-22 16:34:58,662 INFO cfg.MODEL.TRAIN.OPTIMIZATION.WARMUP_EPOCH: 1
2020-03-22 16:34:58,662 INFO cfg.MODEL.TRAIN.OPTIMIZATION.GRAD_NORM_CLIP: 10
2020-03-22 16:34:58,663 INFO
cfg.MODEL.TEST = edict()
2020-03-22 16:34:58,664 INFO cfg.MODEL.TEST.SPLIT: val
2020-03-22 16:34:58,664 INFO cfg.MODEL.TEST.NMS_TYPE: nms_gpu
2020-03-22 16:34:58,665 INFO cfg.MODEL.TEST.MULTI_CLASSES_NMS: False
2020-03-22 16:34:58,665 INFO cfg.MODEL.TEST.NMS_THRESH: 0.01
2020-03-22 16:34:58,666 INFO cfg.MODEL.TEST.SCORE_THRESH: 0.1
2020-03-22 16:34:58,666 INFO cfg.MODEL.TEST.USE_RAW_SCORE: True
2020-03-22 16:34:58,667 INFO cfg.MODEL.TEST.NMS_PRE_MAXSIZE_LAST: 4096
2020-03-22 16:34:58,668 INFO cfg.MODEL.TEST.NMS_POST_MAXSIZE_LAST: 500
2020-03-22 16:34:58,668 INFO cfg.MODEL.TEST.RECALL_THRESH_LIST: [0.5, 0.7]
2020-03-22 16:34:58,669 INFO cfg.MODEL.TEST.EVAL_METRIC: kitti
2020-03-22 16:34:58,669 INFO
cfg.MODEL.TEST.BOX_FILTER = edict()
2020-03-22 16:34:58,670 INFO cfg.MODEL.TEST.BOX_FILTER.USE_IMAGE_AREA_FILTER: True
2020-03-22 16:34:58,670 INFO cfg.MODEL.TEST.BOX_FILTER.LIMIT_RANGE: [0, -40, -3.0, 70.4, 40, 3.0]
2020-03-22 16:34:58,671 INFO cfg.TAG: pointpillar
2020-03-22 16:34:58,688 INFO Loading KITTI dataset
2020-03-22 16:34:59,300 INFO Total samples for KITTI dataset: 3769
2020-03-22 16:35:14,877 INFO ==> Loading parameters from checkpoint pointpillar.pth to GPU
2020-03-22 16:35:15,316 INFO ==> Done (loaded 127/127)
2020-03-22 16:35:15,407 INFO *************** EPOCH no_number EVALUATION *****************
eval: 0%| | 0/943 [00:00<?, ?it/s]/media/nvidia/WD_BLUE_2.5_1TB/pytorch/20191015/pytorch-v1.3.0/aten/src/ATen/native/cuda/IndexKernel.cu:60: lambda ->auto::operator()(int)->auto: block: [213,0,0], thread: [64,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed.
/media/nvidia/WD_BLUE_2.5_1TB/pytorch/20191015/pytorch-v1.3.0/aten/src/ATen/native/cuda/IndexKernel.cu:60: lambda ->auto::operator()(int)->auto: block: [213,0,0], thread: [65,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed.
/media/nvidia/WD_BLUE_2.5_1TB/pytorch/20191015/pytorch-v1.3.0/aten/src/ATen/native/cuda/IndexKernel.cu:60: lambda ->auto::operator()(int)->auto: block: [213,0,0], thread: [66,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed.
/media/nvidia/WD_BLUE_2.5_1TB/pytorch/20191015/pytorch-v1.3.0/aten/src/ATen/native/cuda/IndexKernel.cu:60: lambda ->auto::operator()(int)->auto: block: [213,0,0], thread: [67,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed.
/media/nvidia/WD_BLUE_2.5_1TB/pytorch/20191015/pytorch-v1.3.0/aten/src/ATen/native/cuda/IndexKernel.cu:60: lambda ->auto::operator()(int)->auto: block: [213,0,0], thread: [68,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed.
/media/nvidia/WD_BLUE_2.5_1TB/pytorch/20191015/pytorch-v1.3.0/aten/src/ATen/native/cuda/IndexKernel.cu:60: lambda ->auto::operator()(int)->auto: block: [213,0,0], thread: [69,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed.
/media/nvidia/WD_BLUE_2.5_1TB/pytorch/20191015/pytorch-v1.3.0/aten/src/ATen/native/cuda/IndexKernel.cu:60: lambda ->auto::operator()(int)->auto: block: [213,0,0], thread: [70,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed.
…
…
Traceback (most recent call last): File “test.py”, line 181, in <module> main() File “test.py”, line 177, in main eval_single_ckpt(model, test_loader, args, eval_output_dir, logger, epoch_id) File “test.py”, line 59, in eval_single_ckpt model, test_loader, epoch_id, logger, result_dir=eval_output_dir, save_to_file=args.save_to_file File “/media/buaa/My Passport/PCDet/tools/eval_utils/eval_utils.py”, line 46, in eval_one_epoch pred_dicts, ret_dict = model(input_dict) File “/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py”, line 541, in call result = self.forward(*input, **kwargs) File “/media/buaa/My Passport/PCDet/pcdet/models/detectors/pointpillar.py”, line 34, in forward rpn_ret_dict = self.forward_rpn(**input_dict) File “/media/buaa/My Passport/PCDet/pcdet/models/detectors/pointpillar.py”, line 18, in forward_rpn output_shape=self.grid_size[::-1] File “/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py”, line 541, in call result = self.forward(*input, **kwargs) File “/media/buaa/My Passport/PCDet/pcdet/models/rpn/pillar_scatter.py”, line 32, in forward this_coords = coords[batch_mask, :] RuntimeError: copy_if failed to synchronize: device-side assert triggered eval: 0%|
when i run python3 test.py --cfg_file cfgs/pointpillar.yaml --batch_size 4 --ckpt pointpillar.pth
I met the problem above ,How can i solved the problem? THK
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:11 (1 by maintainers)
I had the same problem and I figured out that this error comes from pcdet/model/rpn/pillar_scatter.py.
The error occurs in line 32 and 33: [32]
indices = this_coords[:, 1] * nz + this_coords[:, 2] * nx + this_coords[:, 3]
[33]indices = indices.type(torch.long)
This casting produces negative indices which leads to an error. My fix looks like this:
indices = this_coords[:, 1].type(torch.long) * nz + this_coords[:, 2].type(torch.long) * nx + this_coords[:, 3].type(torch.long)
However, I don’t understand why
this_coords.type(torch.long)
before the mentioned lines doesn’t fix the error.Ok,I will try to reinstall. BTY, I used the spconv v1.1 Because I saw author was update to spconv 1.1 Thank you for you advice @Crowbar97