Error in training process—— TypeError: only integer tensors of a single element can be converted to an index
See original GitHub issueWhen I run bash train. sh
, the error occurs as follows. How can I solve it?
train: [1][483/484]|Tot: 0:02:25 |ETA: 0:00:01 |tot 11.9668 |hm 1.2029 |wh 1.9665 |reg 0.2171 |dep 2.4322 |dep_sec 2.4956 |dim 0.2450 |rot 1.6286 |rot_sec 1.6110 |amodel_offset 1.1789 |nuscenes_att 0.1946 |velocity 0.5642 |Data 0.002s(0.005s) |Net 0.300s ddd/centerfusionTraceback (most recent call last): File “main.py”, line 140, in <module> main(opt) File “main.py”, line 97, in main log_dict_val, preds = trainer.val(epoch, val_loader) File “/home/wz/wz-research/CenterFusion/src/lib/trainer.py”, line 403, in val return self.run_epoch(‘val’, epoch, data_loader) File “/home/wz/wz-research/CenterFusion/src/lib/trainer.py”, line 178, in run_epoch output, loss, loss_stats = model_with_loss(batch, phase) File “/home/caslx/.local/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 727, in _call_impl result = self.forward(*input, **kwargs) File “/home/wz/wz-research/CenterFusion/src/lib/trainer.py”, line 123, in forward outputs = self.model(batch[‘image’], pc_hm=pc_hm, pc_dep=pc_dep, calib=calib) File “/home/caslx/.local/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 727, in _call_impl result = self.forward(*input, **kwargs) File “/home/wz/wz-research/CenterFusion/src/lib/model/networks/base_model.py”, line 110, in forward pc_hm = generate_pc_hm(z, pc_dep, calib, self.opt) File “/home/wz/wz-research/CenterFusion/src/lib/utils/pointcloud.py”, line 273, in generate_pc_hm pc_dep_to_hm_torch(pc_hm[i], pc_dep_b, depth, bbox, dist_thresh, opt) File “/home/wz/wz-research/CenterFusion/src/lib/utils/pointcloud.py”, line 282, in pc_dep_to_hm_torch bbox_int = torch.tensor([torch.floor(bbox[0]), TypeError: only integer tensors of a single element can be converted to an index
The parameters in train. sh
are shown like this:
export CUDA_DEVICE_ORDER=PCI_BUS_ID
export CUDA_VISIBLE_DEVICES=0,1
cd ../src
# train
python main.py \
ddd \
--exp_id centerfusion \
--shuffle_train \
--train_split mini_train \
--val_split val \
--val_intervals 1 \
--run_dataset_eval \
--nuscenes_att \
--velocity \
--batch_size 4 \
--lr 2.5e-4 \
--num_epochs 60 \
--lr_step 50 \
--save_point 20,40,50 \
--gpus 0,1 \
--not_rand_crop \
--flip 0.5 \
--shift 0.1 \
--pointcloud \
--radar_sweeps 6 \
--pc_z_offset 0.0 \
--pillar_dims 1.0,0.2,0.2 \
--max_pc_dist 60.0 \
#--load_model ../models/centerfusion_e60.pth \
#--load_model ../models/centernet_baseline_e170.pth \
# --freeze_backbone \
# --resume \
cd ..
Issue Analytics
- State:
- Created 3 years ago
- Reactions:3
- Comments:8 (5 by maintainers)
THX!!!, I change to run this demo on another device (Titan Xp) with CUDA10.0 and PyTorch1.2 and it works. Because RTX3080 requires CUDA>=11.1 and PyTorch 1.2 can’t be compatible with CUDA11.1, have you considered updating the code to be compatible with a higher Pytorch version like 1.7? If I want to run on the RTX3080, do you have any suggestions?
Hi @Bosszhe,
I successfully run it on an RTX 3060 with CUDA 11.1 and Pytorch 1.8 using https://github.com/jinfagang/DCNv2_latest instead of the recommended DCNv2.