Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Some code questions

See original GitHub issue

Hello @pangsu0613, could you please explain with words the idea behind this algorithm to find the overlaps between the projected_3d_boxes (here in the code this is called just ‘boxes’) and the 2d boxes ( here in the code called ‘query_boxes’) a)

# pang added to build the tensor for the second stage of training
@numba.jit(nopython=True,parallel=True)
def build_stage2_training(boxes, query_boxes, criterion, scores_3d, scores_2d, dis_to_lidar_3d,overlaps,tensor_index):
    N = boxes.shape[0] #70400
    K = query_boxes.shape[0] #30

    max_num = 900000
    ind=0
    ind_max = ind
    for k in range(K):
        qbox_area = ((query_boxes[k, 2] - query_boxes[k, 0]) *
                     (query_boxes[k, 3] - query_boxes[k, 1]))
        for n in range(N):

            iw = (min(boxes[n, 2], query_boxes[k, 2]) -
                  max(boxes[n, 0], query_boxes[k, 0]))
            if iw > 0:
                ih = (min(boxes[n, 3], query_boxes[k, 3]) -
                      max(boxes[n, 1], query_boxes[k, 1]))
                if ih > 0:
                    if criterion == -1:
                        ua = (
                            (boxes[n, 2] - boxes[n, 0]) *
                            (boxes[n, 3] - boxes[n, 1]) + qbox_area - iw * ih)
                    elif criterion == 0:
                        ua = ((boxes[n, 2] - boxes[n, 0]) *
                              (boxes[n, 3] - boxes[n, 1]))
                    elif criterion == 1:
                        ua = qbox_area
                    else:
                        ua = 1.0

                    overlaps[ind,0] = iw * ih / ua
                    overlaps[ind,1] = scores_3d[n,0]
                    overlaps[ind,2] = scores_2d[k,0]
                    overlaps[ind,3] = dis_to_lidar_3d[n,0]
                    tensor_index[ind,0] = k
                    tensor_index[ind,1] = n
                    ind = ind+1

                elif k==K-1:
                    overlaps[ind,0] = -10
                    overlaps[ind,1] = scores_3d[n,0]
                    overlaps[ind,2] = -10
                    overlaps[ind,3] = dis_to_lidar_3d[n,0]
                    tensor_index[ind,0] = k
                    tensor_index[ind,1] = n
                    ind = ind+1
            elif k==K-1:
                overlaps[ind,0] = -10
                overlaps[ind,1] = scores_3d[n,0]
                overlaps[ind,2] = -10
                overlaps[ind,3] = dis_to_lidar_3d[n,0]
                tensor_index[ind,0] = k
                tensor_index[ind,1] = n
                ind = ind+1
    if ind > ind_max:
        ind_max = ind
    return overlaps, tensor_index, ind

b) here when you calculate the feature ‘distance_to_the_lidar’, why do you divide by 82.0 ? https://github.com/pangsu0613/CLOCs/blob/b2f0e23b0deb0e192121bbd563691f49c85d3bbc/second/pytorch/models/voxelnet.py#L497

c) also, I don’t understand why the output scores of the fusion network ‘cls_pred’ are in raw log format even though the input 3d and 2d scores were in sigmoid format. Can you please tell me the reason

Issue Analytics

State:
Created 2 years ago
Comments:21 (9 by maintainers)

Top GitHub Comments

4reactions

pangsu0613commented, May 15, 2021

Hello @xavidzo (a) The 3D boxes -> 2D boxes projection is not done here. It is done in voxelnet.py. For this function, the input are projected 3D boxes (After projection, there are 8 cornoer points for each box, but we choose the max and min xy to form axis aligned 2D box, this would be in the same format as the 2D detector ouputs), 2D boxes from 2D detector and some related corresponding information. The purpose of this function is to build the input tensor for fusion, since we only care about overlapped projected 3D and 2D detections, so we calculate the IoU between them. The main idea about calculate the IoU is first we check if the 2 boxes have overlaps in x axis direction, if yes, then check y direction, if both yes, then it means they have a overlap, then just calculate the overlapped regions and so on. (b) because in SECOND, the detection field of view in LiDAR coordinate are set as 0<x<70.4, -40<y<40, -3<z<1, so in x-y plane, the longest distance is sqrt(70.4^2 + 40^2), which is around 81, I put 82 to make it smaller than 1, this value does not have a big impact on the final results, 81, 80 also works fine. © because the fusion layers are some CNNs, and the final output layer does not have a ReLU nonlinearity. This is similar to most of the existing detection and classification heads used in other works.

3reactions

xavidzocommented, Jun 14, 2021

Hello @pangsu0613 , so I trained one CenterPoint model for multi-class fusion, basically as you suggested I used one fusion layer for every class. These are my results:

CenterPoint alone three-head (one head for each class) 75 epoch
2021-06-09 03:46:14,777 - INFO - Evaluation official:
car AP(Average Precision)@0.70, 0.70, 0.70:
bbox AP:89.05, 86.80, 81.62
bev  AP:87.21, 82.00, 80.29
3d   AP:75.79, 67.43, 64.99
aos  AP:88.97, 86.50, 81.30
car AP(Average Precision)@0.70, 0.50, 0.50:
bbox AP:89.05, 86.80, 81.62
bev  AP:89.75, 88.99, 88.38
3d   AP:89.59, 88.67, 87.80
aos  AP:88.97, 86.50, 81.30
pedestrian AP(Average Precision)@0.50, 0.50, 0.50:
bbox AP:65.31, 63.51, 61.29
bev  AP:63.43, 59.86, 55.84
3d   AP:57.10, 52.62, 48.05
aos  AP:58.37, 56.13, 53.71
pedestrian AP(Average Precision)@0.50, 0.25, 0.25:
bbox AP:65.31, 63.51, 61.29
bev  AP:75.96, 74.29, 71.91
3d   AP:76.03, 74.15, 71.74
aos  AP:58.37, 56.13, 53.71
cyclist AP(Average Precision)@0.50, 0.50, 0.50:
bbox AP:86.35, 70.69, 67.92
bev  AP:82.81, 65.80, 62.30
3d   AP:82.19, 62.94, 59.45
aos  AP:85.89, 67.23, 64.68
cyclist AP(Average Precision)@0.50, 0.25, 0.25:
bbox AP:86.35, 70.69, 67.92
bev  AP:88.04, 71.55, 67.91
3d   AP:88.04, 71.25, 67.58
aos  AP:85.89, 67.23, 64.68

CenterPoint three-head 75 epoch + CLOCs 1 epoch, 2d detections threshold > 0.1
2021-06-13 23:12:42,213 - INFO - Evaluation official:
car AP(Average Precision)@0.70, 0.70, 0.70:
bbox AP:89.84, 87.66, 84.55
bev  AP:87.93, 83.18, 83.13
3d   AP:79.82, 71.08, 66.19
aos  AP:89.76, 87.31, 84.14
car AP(Average Precision)@0.70, 0.50, 0.50:
bbox AP:89.84, 87.66, 84.55
bev  AP:90.22, 89.49, 88.85
3d   AP:90.14, 89.24, 88.42
aos  AP:89.76, 87.31, 84.14
pedestrian AP(Average Precision)@0.50, 0.50, 0.50:
bbox AP:70.75, 70.08, 64.08
bev  AP:68.71, 66.30, 60.96
3d   AP:62.79, 58.04, 51.80
aos  AP:63.25, 62.28, 56.59
pedestrian AP(Average Precision)@0.50, 0.25, 0.25:
bbox AP:70.75, 70.08, 64.08
bev  AP:83.95, 82.92, 80.49
3d   AP:83.87, 82.83, 80.33
aos  AP:63.25, 62.28, 56.59
cyclist AP(Average Precision)@0.50, 0.50, 0.50:
bbox AP:88.39, 77.12, 69.63
bev  AP:86.18, 67.84, 66.45
3d   AP:85.43, 66.04, 59.00
aos  AP:87.91, 73.84, 67.01
cyclist AP(Average Precision)@0.50, 0.25, 0.25:
bbox AP:88.39, 77.12, 69.63
bev  AP:95.10, 76.54, 69.22
3d   AP:95.09, 76.47, 69.20
aos  AP:87.91, 73.84, 67.01

I trained the clocs fusion layers with the fast focal loss of centerpoint, it converges really fast in one or 2 epochs, more epochs make the results worse. When there is no IoU between the projected 3d bbox and 2d bbox, I changed the default parameter from -10 to -1000, this gave me the best results The results look good in terms of increasing the mAP, however the inference speed is for me still an issue. I measured that running 3 clocs heads takes ~ 3-4 ms, and building the three input tensors (one for each class) takes also 3-4 ms, but also the processing functions before take some time, so in total the delay introduced is around 20 ms.

Could you take a quick look at my code and perhaps spot some ways it could be further optimized?

This is my fusion_utils.py script

def prepare_fusion_inputs(_3d_detector, example, _2d_stored_detections_path_car, _2d_stored_detections_path_ped_cyc ):


    t_init = time.time()

    img_idx = example['image'][0]['image_idx']
    
    detection_2d_result_path_car = pathlib.Path(_2d_stored_detections_path_car)
    detection_2d_file_name_car = f"{detection_2d_result_path_car}/{kitti.get_image_index_str(img_idx)}.txt"

    detection_2d_result_path_ped_cyc = pathlib.Path(_2d_stored_detections_path_ped_cyc)
    detection_2d_file_name_ped_cyc = f"{detection_2d_result_path_ped_cyc}/{kitti.get_image_index_str(img_idx)}.txt"

    with open(detection_2d_file_name_car, 'r') as f:
        lines = f.readlines()

    with open(detection_2d_file_name_ped_cyc, 'r') as f:
        lines2 = f.readlines()

    lines = lines + lines2
    
    content = [line.strip().split(' ') for line in lines]
    predicted_class = np.array([x[0] for x in content],dtype='object')
    detection_result = np.array([[float(info) for info in x[4:8]] for x in content]).reshape(-1, 4)
    score = np.array([float(x[15]) for x in content])  
    f_detection_result=np.hstack((predicted_class.reshape(-1,1), detection_result))
    f_detection_result=np.append(f_detection_result,score.reshape(-1,1),1)
    middle_predictions=f_detection_result.reshape(-1,6)
    middle_predictions[np.where(middle_predictions[:,0]!='Car'), 5] = middle_predictions[np.where(middle_predictions[:,0]!='Car'),5]/1000 
    top_predictions=middle_predictions[np.where(middle_predictions[:,5]>=0.1)]


    top_predictions_car = top_predictions[np.where(top_predictions[:,0]=='Car')]

    top_predictions_ped = top_predictions[np.where(top_predictions[:,0]=='Pedestrian')]
    top_predictions_cyc = top_predictions[np.where(top_predictions[:,0]=='Cyclist')]


    top_predictions = [top_predictions_car, top_predictions_ped, top_predictions_cyc]

    t_fin = time.time()

    print("time before 3d prediction in prepare fusion inputs: ", t_fin - t_init)

    pred_3d_start_time = time.time()

    torch.cuda.synchronize()
    with torch.no_grad():
    
        output = _3d_detector(example, return_loss=False, pre_fusion=True)
        
    pred_3d_finish_time = time.time()

    print("time of 3d prediction in prepare_fusion_inputs: ", pred_3d_finish_time - pred_3d_start_time)

    preds_dict = output[0]

    prepare_input_tensor_start_time = time.time()

    iou_test, tensor_index = prepare_input_tensor(example, preds_dict, top_predictions)

    prepare_input_tensor_finish_time = time.time()

    print("time spent in prepare_input_tensor: ", prepare_input_tensor_finish_time - prepare_input_tensor_start_time) 

    return preds_dict, iou_test, tensor_index




def prepare_input_tensor(example, preds_dict, top_predictions):
    batch_size = len(example["num_voxels"])

    print('example[calib] = ', example['calib'])
    rect = example["calib"]["rect"]
    Trv2c = example["calib"]["Trv2c"]
    P2 = example["calib"]["P2"]

    image_shape = example["image"][0]["image_shape"]
    final_box_preds = preds_dict["box3d_lidar"]

    final_scores = preds_dict["scores"]

    final_box_preds = final_box_preds.float()
    rect = rect.squeeze().float()
    Trv2c = Trv2c.squeeze().float()
    P2 = P2.squeeze().float()
    t3 = time.time()

    final_box_preds_camera = box_torch_ops.box_lidar_to_camera(
        final_box_preds, rect, Trv2c)
    locs = final_box_preds_camera[:, :3]
    dims = final_box_preds_camera[:, 3:6]
    angles = final_box_preds_camera[:, 6]
    camera_box_origin = [0.5, 1.0, 0.5]
    box_corners = box_torch_ops.center_to_corner_box3d(
        locs, dims, angles, camera_box_origin, axis=1)

    box_corners_in_image = box_torch_ops.project_to_image(
        box_corners, P2)


    # box_corners_in_image: [N, 8, 2]
    minxy = torch.min(box_corners_in_image, dim=1)[0]
    maxxy = torch.max(box_corners_in_image, dim=1)[0]
    img_height = image_shape[0]
    img_width = image_shape[1]


    minxy[:,0] = torch.clamp(minxy[:,0],min = 0,max = img_width)
    minxy[:,1] = torch.clamp(minxy[:,1],min = 0,max = img_height)
    maxxy[:,0] = torch.clamp(maxxy[:,0],min = 0,max = img_width)
    maxxy[:,1] = torch.clamp(maxxy[:,1],min = 0,max = img_height)
    box_2d_preds = torch.cat([minxy, maxxy], dim=1)

    t4 = time.time()

    print("partial time 1 in prepare_input_tensor: ", t4-t3)

    dis_to_lidar = torch.norm(final_box_preds[:,:2],p=2,dim=1,keepdim=True)/82.0


    boxes_2d_detector = [np.zeros((np.maximum(1, top_predictions[0].shape[0]), 4)), np.zeros((np.maximum(1,top_predictions[1].shape[0]),4)), np.zeros((np.maximum(1,top_predictions[2].shape[0]), 4))]
    boxes_2d_scores = [np.zeros((boxes_2d_detector[0].shape[0], 1)), np.zeros((boxes_2d_detector[1].shape[0],1)), np.zeros((boxes_2d_detector[2].shape[0], 1))]

    boxes_2d_detector[0][0:top_predictions[0].shape[0],:]=top_predictions[0][0:top_predictions[0].shape[0],1:5]
    boxes_2d_detector[1][0:top_predictions[1].shape[0],:]=top_predictions[1][0:top_predictions[1].shape[0],1:5]
    boxes_2d_detector[2][0:top_predictions[2].shape[0],:]=top_predictions[2][0:top_predictions[2].shape[0],1:5]

    boxes_2d_scores[0][0:top_predictions[0].shape[0],:]=top_predictions[0][0:top_predictions[0].shape[0],5].reshape(-1,1)
    boxes_2d_scores[1][0:top_predictions[1].shape[0],:]=top_predictions[1][0:top_predictions[1].shape[0],5].reshape(-1,1)
    boxes_2d_scores[2][0:top_predictions[2].shape[0],:]=top_predictions[2][0:top_predictions[2].shape[0],5].reshape(-1,1)




    time_gpu_to_cpu_start = time.time()
    box_2d_preds_numpy = box_2d_preds.detach().cpu().numpy()

    print("box_2d_preds_numpy shape: ", box_2d_preds_numpy.shape)
    final_scores_numpy = final_scores.detach().cpu().numpy()
    dis_to_lidar_numpy = dis_to_lidar.detach().cpu().numpy()
    time_gpu_to_cpu_end = time.time()

    print("time of transfer from gpu to cpu: ", time_gpu_to_cpu_end - time_gpu_to_cpu_start)

    overlaps1 = np.zeros((900000,4),dtype=np.float32)
    overlaps2 = np.zeros((900000,4),dtype=np.float32)
    overlaps3 = np.zeros((900000,4),dtype=np.float32)
    tensor_index1 = np.zeros((900000,2),dtype=np.float32)
    tensor_index2 = np.zeros((900000,2),dtype=np.float32)
    tensor_index3 = np.zeros((900000,2),dtype=np.float32)
    overlaps1[:,:] = -1
    overlaps2[:,:] = -1
    overlaps3[:,:] = -1
    tensor_index1[:,:] = -1
    tensor_index2[:,:] = -1
    tensor_index3[:,:] = -1

    overlaps = [overlaps1, overlaps2, overlaps3]
    tensor_indices = [tensor_index1, tensor_index2, tensor_index3]


    non_empty_iou_test_tensor_list = []

    non_empty_tensor_index_tensor_list = []




    time_iou_build_start=time.time()


    for i in range(3):
        iou_test,tensor_ind, max_num = eval.build_stage2_training(box_2d_preds_numpy[(i)*13392:(i+1)*13392, :],
                                            boxes_2d_detector[i],
                                            -1,
                                            final_scores_numpy[(i)*13392:(i+1)*13392,:].reshape(-1,1),
                                            boxes_2d_scores[i],
                                            dis_to_lidar_numpy[(i)*13392:(i+1)*13392,:],
                                            overlaps[i],
                                            tensor_indices[i])


        iou_test_tensor = torch.FloatTensor(iou_test)
        iou_test_tensor = iou_test_tensor.permute(1,0)
        iou_test_tensor = iou_test_tensor.reshape(1,4,1,900000)

        tensor_ind = torch.LongTensor(tensor_ind)
        tensor_ind = tensor_ind.reshape(-1,2)

    
        if max_num == 0:
            non_empty_iou_test_tensor = torch.zeros(1,4,1,2)
            non_empty_iou_test_tensor[:,:,:,:] = -1
            non_empty_tensor_index_tensor = torch.zeros(2,2)
            non_empty_tensor_index_tensor[:,:] = -1
        else:
            non_empty_iou_test_tensor = iou_test_tensor[:,:,:,:max_num]
            non_empty_tensor_index_tensor = tensor_ind[:max_num,:]

        non_empty_iou_test_tensor_list.append(non_empty_iou_test_tensor)
        non_empty_tensor_index_tensor_list.append(non_empty_tensor_index_tensor)

    time_iou_build_end=time.time()

    print("time to build tensor: ", time_iou_build_end - time_iou_build_star

    return non_empty_iou_test_tensor_list, non_empty_tensor_index_tensor_list

And this is my file for the detector, fusion_layer.py script:

class CLOCsFusion(nn.Module):
    def __init__(self):
        super(CLOCsFusion, self).__init__()
        self.fuse_2d_3d = Sequential(
            nn.Conv2d(4,18,1),
            nn.ReLU(),
            nn.Conv2d(18,36,1),
            nn.ReLU(),
            nn.Conv2d(36,36,1),
            nn.ReLU(),
            nn.Conv2d(36,1,1),
        )

        self.fuse_2d_3d = self.fuse_2d_3d.cuda()

    def forward(self, input):
        out = self.fuse_2d_3d(input)
        return out



@FUSION.register_module
class FusionLayer(nn.Module):
    def __init__(self, name, _3d_net_cfg_path, _3d_net_path, _2d_data_path_car, _2d_data_path_ped_cyc):
        super(FusionLayer, self).__init__()
        self.name = name
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

        self._3d_detector_config_path = _3d_net_cfg_path

        self._2d_stored_detections_path_car = _2d_data_path_car
        self._2d_stored_detections_path_ped_cyc = _2d_data_path_ped_cyc

        self._3d_net_cfg = Config.fromfile(self._3d_detector_config_path)
        self._3d_detector = build_detector(self._3d_net_cfg.model, train_cfg=None, test_cfg=self._3d_net_cfg.test_cfg)
        checkpoint = torch.load(_3d_net_path)

        self._3d_detector.load_state_dict(checkpoint["state_dict"])

        self._3d_detector = self._3d_detector.to(self.device).eval().freeze()

        self.bbox_head = self._3d_detector.bbox_head

        self.weight = self.bbox_head.weight
        self.code_weights = self.bbox_head.code_weights

        self.dataset = self.bbox_head.dataset

        self.num_classes = [1, 1, 1]

        self.crit = FastFocalLoss()

        logger = logging.getLogger("Fusion Layer")
        self.logger = logger 

        self.maxpool = Sequential(
            nn.MaxPool2d([100,1],1),
        )

        self.tasks = nn.ModuleList()

        for j, _ in enumerate(self.num_classes):
            self.tasks.append(CLOCsFusion())



    def init_weights(self, pretrained=None):
        if pretrained is not None:
            logger = logging.getLogger()
            logger.info("load model from: {}".format(pretrained))

    
    def fusion(self, input_1_list, tensor_index_list):

        flag = -1

        x_list = []

        for i in range(len(input_1_list)):

            tensor_index_list[i] = tensor_index_list[i].cuda()
            input_1_list[i] = input_1_list[i].cuda()

            if tensor_index_list[i][0,0] == -1:
                out_1 = torch.zeros(1,100,13392,dtype = input_1_list[i].dtype,device = input_1_list[i].device)
                out_1[:,:,:] = -9999999
                flag = 0
            else:
                x = self.tasks[i](input_1_list[i])
                out_1 = torch.zeros(1,100,13392,dtype = input_1_list[i].dtype,device = input_1_list[i].device)
                out_1[:,:,:] = -9999999
                out_1[:,tensor_index_list[i][:,0],tensor_index_list[i][:,1]] = x[0,:,0,:]
                flag = 1
            x = self.maxpool(out_1)
            #x, _ = torch.max(out_1,1)
            x = x.squeeze().reshape(1,-1,1)
            print("x shape in fusion: ", x.shape)
            x_list.append(x)
        
        return x_list,flag

    def _sigmoid(self, x):
        y = torch.clamp(x.sigmoid_(), min=1e-4, max=1-1e-4)
        return y


    def loss(self, example, fused_hm_preds, flag, **kwargs):   

        rets = []
 
        for task_id, _ in enumerate(self.num_classes):

            fused_hm_preds[task_id] = self._sigmoid(fused_hm_preds[task_id])

            hm_loss = self.crit(fused_hm_preds[task_id], example['hm'][task_id], example['ind'][task_id], example['mask'][task_id], example['cat'][task_id])

            ret = {}

            loss = hm_loss 

            print("loss in fusion layer: ", loss)


            ret.update({'loss': loss, 'hm_loss': hm_loss.detach().cpu(), })
            rets.append(ret)


        """convert batch-key to key-batch
        """
        rets_merged = defaultdict(list)
        for ret in rets:
            for k, v in ret.items():
                rets_merged[k].append(v)

        return rets_merged

        
    @torch.no_grad()
    def predict(self, example, test_cfg, box_preds, fused_hm):
        """decode, nms, then return the detection result
        """
        # get loss info
        rets = []
        metas = []

        post_center_range = test_cfg.post_center_limit_range
        if len(post_center_range) > 0:
            post_center_range = torch.tensor(
                post_center_range,
                dtype=fused_hm[0].dtype,
                device=fused_hm[0].device,
            )

        for task_id, preds_dict in enumerate(self.num_classes):
            batch_size = fused_hm[task_id].shape[0]

            if "metadata" not in example or len(example["metadata"]) == 0:
                meta_list = [None] * batch_size
            else:
                meta_list = example["metadata"]


            fused_hm[task_id] = fused_hm[task_id].reshape(1,124,108,1)
            print("fused_hm shape: ", fused_hm[task_id].shape)
            print("fused_hm: ", fused_hm[task_id])
            batch_hm = torch.sigmoid(fused_hm[task_id])
            batch, H, W, num_cls = batch_hm.size()

            batch_hm = batch_hm.reshape(batch, H*W, num_cls)

            print("box preds shape: ", box_preds.shape)

            metas.append(meta_list)

            batch_box_preds = box_preds[task_id*13392:(task_id+1)*13392,:].unsqueeze(0)
            print("batch_box_preds shape: ", batch_box_preds.shape)
            rets.append(self.post_processing(batch_box_preds, batch_hm, test_cfg, post_center_range)) 


        # Merge branches results
        ret_list = []
        num_samples = len(rets[0])

        ret_list = []
        for i in range(num_samples):
            ret = {}
            for k in rets[0][i].keys():
                if k in ["box3d_lidar", "scores"]:
                    ret[k] = torch.cat([ret[i][k] for ret in rets])
                elif k in ["label_preds"]:
                    flag = 0
                    for j, num_class in enumerate(self.num_classes):
                        rets[j][i][k] += flag
                        flag += num_class
                    ret[k] = torch.cat([ret[i][k] for ret in rets])

            ret['metadata'] = metas[0][i]
            ret_list.append(ret)

        return ret_list 


    @torch.no_grad()
    def post_processing(self, batch_box_preds, batch_hm, test_cfg, post_center_range):
        batch_size = len(batch_hm)

        prediction_dicts = []
        for i in range(batch_size):
            box_preds = batch_box_preds[i]
            hm_preds = batch_hm[i]

            print("hm_preds shape: ", hm_preds.shape)

            scores, labels = torch.max(hm_preds, dim=-1)


            print('scores shape before mask: ', scores.shape)
  

            score_mask = scores > test_cfg.score_threshold
            distance_mask = (box_preds[..., :3] >= post_center_range[:3]).all(1) \
                & (box_preds[..., :3] <= post_center_range[3:]).all(1)
        
            mask = distance_mask & score_mask 

            box_preds = box_preds[mask]
            scores = scores[mask]
            labels = labels[mask]

            print('scores shape after mask: ', scores.shape)


            boxes_for_nms = box_preds[:, [0, 1, 2, 3, 4, 5, -1]]

            selected = box_torch_ops.rotate_nms_pcdet(boxes_for_nms, scores, 
                                thresh=test_cfg.nms.nms_iou_threshold,
                                pre_maxsize=test_cfg.nms.nms_pre_max_size,
                                post_max_size=test_cfg.nms.nms_post_max_size)

            selected_boxes = box_preds[selected]
            selected_scores = scores[selected]
            selected_labels = labels[selected]

            prediction_dict = {
                'box3d_lidar': selected_boxes,
                'scores': selected_scores,
                'label_preds': selected_labels
            }

            prediction_dicts.append(prediction_dict)

        return prediction_dicts 



    def forward(self, example, return_loss=True):


        all_3d_output, fusion_input_list, tensor_index_list = prepare_fusion_inputs(self._3d_detector, example, self._2d_stored_detections_path_car, self._2d_stored_detections_path_ped_cyc)
        

        time_before_fusion = time.time()


        fused_hm_preds, flag = self.fusion(fusion_input_list,tensor_index_list)


        time_after_fusion = time.time()

        print("time of clocs inference: ", time_after_fusion - time_before_fusion)



        if return_loss:

            for i in range(3):
                fused_hm_preds[i] = fused_hm_preds[i].reshape(1,124,108,1)
                fused_hm_preds[i] = fused_hm_preds[i].permute(0, 3, 1, 2).contiguous()
 

            return self.loss(example, fused_hm_preds, flag)

        else:

            time_predict_start = time.time()

            results = self.predict(example, self._3d_detector.test_cfg, all_3d_output["box3d_lidar"], fused_hm_preds)

            time_predict_finish = time.time()

            print("time of prediction in evaluation: ", time_predict_finish - time_predict_start)

            return results

I didn’t modify the function def build_stage2_training() to build the input tensor as in numba it runs faster than pytorch. As I said before this takes 3-4 ms in total for three classes, it’s fine, but the processing steps before take longer. I would appreciate any feedback from your side, thank you in advance!

Top Results From Across the Web

Top 30 Programming / Coding Interview Questions & Answers

Frequently Asked Basic Programming / Coding Interview Questions · Take an integer array with the numbers from 1 to 100. · Compute the...

Top 30 Programming questions asked in Interview - Java C ...

1. String Programming Interview Questions · 1) What is the difference between String, StringBuilder, and StringBuffer in Java? (answer) · 2) Why String...

100+ Coding Interview Questions for Programmers and ...

Top 100 Coding Problems from Programming Job interviews · How is a bubble sort algorithm implemented? ( · How is a merge sort...

Top 109 Scary Coding Interview Questions SOLVED with ...

The Top 13 General Coding, Design & Programming Fundamentals Questions · 1. What are the pros and cons of your chosen technology? ·...

Top 75 Programming Interview Questions Answers to Crack ...

1. Array-based Programming Interview Questions · 1. How to find the missing number in a given integer array of 1 to 100? (solution)...