Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

all-zero gt_boxes are loaded and cause runtime error

See original GitHub issue

Hi, team,

I noticed that in dataset.py the following code was added for preventing loading a new frame without gt_boxes.

if len(data_dict['gt_boxes']) == 0:
    new_index = np.random.randint(self.__len__())
    return self.__getitem__(new_index)

But it still happens that sometimes the gt_boxes in batch_dict for certain frames are all zeros before being processed by the model. Some of the operations (such as torch.max()) cannot be accomplished with the situation above.

Here I added the following code in point_rcnn.py at the beginning of forword(),

def forward(self, batch_dict):
    for gt_box in batch_dict['gt_boxes']:
        if gt_box.max() == gt_box.min() == 0:
            pdb.set_trace()
    ...

Everytime it stopped here, I printed batch_dict['gt_boxes'] and would find one of the frames with gt_boxes all zeros as the following.

(Pdb) gt_boxes = batch_dict['gt_boxes']
(Pdb) gt_boxes[0]
tensor([[0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0.]], device='cuda:0')

As PointRCNN use PointResidualCoder, if gt_boxes is all-zero, no foreground gt_boxes will be selected and the input of PointResidualCoder will be empty. Then the following assert in encode_torch() will raise RuntimeError as follows.

assert gt_classes.max() <= self.mean_size.shape[0]

RuntimeError: invalid argument 1: cannot perform reduction function max on tensor with no elements because the operation does not have an identity at /opt/conda/conda-bld/pytorch_1587428270644/work/aten/src/THC/generic/THCTensorMathReduce.cu:85

Here I trained the model with CLASS_NAMES: ['Cyclist'].

Is it normal and what could be the possible reason behind?

Issue Analytics

State:
Created 3 years ago
Comments:11 (1 by maintainers)

Top GitHub Comments

3reactions

gardenbabycommented, Oct 26, 2020

I reviewed the code and found that the problem is from DataBaseSampler.

As there computes the IoUs of sampled boxes in __call__()(database_sampler.py) and only select boxes which are not overlapped with others, valid_mask(line 188) could be empty.

So when I choose only one class for training, such as Cyclist , even the following constrain in dataset.py (line 127) is satisfied, all the remained boxes could be anything but Cyclist.

 if len(data_dict['gt_boxes']) == 0:
     new_index = np.random.randint(self.__len__())
     return self.__getitem__(new_index)

When it runs to the following code in line 132 (dataset.py), selected will be empty and finally no gt_boxes can be selected.

selected = common_utils.keep_arrays_by_name(data_dict['gt_names'], self.class_names)
data_dict['gt_boxes'] = data_dict['gt_boxes'][selected]
data_dict['gt_names'] = data_dict['gt_names'][selected]

So I changed the code in line 127 (dataset.py) from

if len(data_dict['gt_boxes']) == 0:
    ...

gt_boxes_mask = np.array([n in self.class_names for n in data_dict['gt_names']], dtype=np.bool_)
if gt_boxes_mask.sum() == 0:
    ...

It seems that it works.

0reactions

sshaoshuaicommented, Nov 4, 2020

Hi all,

This bug has been fixed in https://github.com/open-mmlab/OpenPCDet/pull/340.

Actually I moved the empty check to the end of the prepare_data function since data_processor could also modify the gt_boxes.

The error in PointResidualCoder is another bug and I also fixed it in this PR.

Note that both of these two errors don’t affect the performance.

Thank you all for the bug information.

Top Results From Across the Web

fix runtime errors in Internet Explorer - Microsoft Support

This article provides the solution to solve the runtime errors that occur in Internet Explorer.

Failed to load the runtime error when deploying with ClickOnce

I have created a Windows Forms application with Visual Studio 2010 targeting .NET 2.0. I am deploying it using ClickOnce, and it has...

How to Handle the UnsatisfiedLinkError Runtime Error in Java

The Java java.lang.UnsatisfiedLinkError is thrown when a program uses a native libaray but is unable to find it at runtime for some reason....

Runtime Errors

Stack Overflow Exception: Explanation: This happens when Java runs out of it's available memory. Solution: This is generally caused by an infinite loop...

Chapter 4: Run-time System Error Messages - Micro Focus

For example, execution, I/O, load or write. ... yyy, Either the run-time system error number or, if the error is caused by an...