question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Problems using custom data sets

See original GitHub issue

I am trying to use my own Lidar data to test PV-RCNN instead of kitti data, I used similar kaggle annotations However, I get an error when trying to run the code and the error message is as follows

File "***/OpenPCDet/pcdet/datasets/innovusion/innovusion_dataset.py", line 77, in __getitem__
    data_dict = self.prepare_data(data_dict=input_dict)
  File "***/OpenPCDet/pcdet/datasets/dataset.py", line 124, in prepare_data
    'gt_boxes_mask': gt_boxes_mask
  File "***/OpenPCDet/pcdet/datasets/augmentor/data_augmentor.py", line 93, in forward
    data_dict = cur_augmentor(data_dict=data_dict)
  File "***/OpenPCDet/pcdet/datasets/augmentor/database_sampler.py", line 179, in __call__
    sampled_boxes = np.stack([x['box3d_lidar'] for x in sampled_dict], axis=0).astype(np.float32)
  File "<__array_function__ internals>", line 6, in stack
  File "***/anaconda3/envs/ml/lib/python3.7/site-packages/numpy/core/shape_base.py", line 423, in stack
    raise ValueError('need at least one array to stack')
ValueError: need at least one array to stack

I located the code and found that it was related to data enhancement, in pcdet/datasets/augmentor/database_sampler.py

    def __call__(self, data_dict):
        """
        Args:
            data_dict:
                gt_boxes: (N, 7 + C) [x, y, z, dx, dy, dz, heading, ...]

        Returns:

        """
        gt_boxes = data_dict['gt_boxes']
        gt_names = data_dict['gt_names'].astype(str)
        existed_boxes = gt_boxes
        total_valid_sampled_dict = []
        for class_name, sample_group in self.sample_groups.items():
            if self.limit_whole_scene:
                num_gt = np.sum(class_name == gt_names)
                sample_group['sample_num'] = str(int(self.sample_class_num[class_name]) - num_gt)
            if int(sample_group['sample_num']) > 0:
                sampled_dict = self.sample_with_fixed_number(class_name, sample_group)  ### need help

                sampled_boxes = np.stack([x['box3d_lidar'] for x in sampled_dict], axis=0).astype(np.float32)

                if self.sampler_cfg.get('DATABASE_WITH_FAKELIDAR', False):
                    sampled_boxes = box_utils.boxes3d_kitti_fakelidar_to_lidar(sampled_boxes)

                iou1 = iou3d_nms_utils.boxes_bev_iou_cpu(sampled_boxes[:, 0:7], existed_boxes[:, 0:7])
                iou2 = iou3d_nms_utils.boxes_bev_iou_cpu(sampled_boxes[:, 0:7], sampled_boxes[:, 0:7])
                iou2[range(sampled_boxes.shape[0]), range(sampled_boxes.shape[0])] = 0
                iou1 = iou1 if iou1.shape[1] > 0 else iou2
                valid_mask = ((iou1.max(axis=1) + iou2.max(axis=1)) == 0).nonzero()[0]
                valid_sampled_dict = [sampled_dict[x] for x in valid_mask]
                valid_sampled_boxes = sampled_boxes[valid_mask]

                existed_boxes = np.concatenate((existed_boxes, valid_sampled_boxes), axis=0)
                total_valid_sampled_dict.extend(valid_sampled_dict)

        sampled_gt_boxes = existed_boxes[gt_boxes.shape[0]:, :]
        if total_valid_sampled_dict.__len__() > 0:
            data_dict = self.add_sampled_boxes_to_scene(data_dict, sampled_gt_boxes, total_valid_sampled_dict)

        data_dict.pop('gt_boxes_mask')
        return data_dict

Then the key function is sample_with_fixed_number(self, class_name, sample_group)

    def sample_with_fixed_number(self, class_name, sample_group):
        """
        Args:
            class_name:
            sample_group:
        Returns:

        """
        sample_num, pointer, indices = int(sample_group['sample_num']), sample_group['pointer'], sample_group['indices']
        if pointer >= len(self.db_infos[class_name]):
            indices = np.random.permutation(len(self.db_infos[class_name]))
            pointer = 0

        sampled_dict = [self.db_infos[class_name][idx] for idx in indices[pointer: pointer + sample_num]]
        pointer += sample_num
        sample_group['pointer'] = pointer
        sample_group['indices'] = indices
        return sampled_dict

Self.db_infos is used in the code, it is specified by sampler_cfg.DB_INFO_PATH, but my data dose not have it, so I am stuck here, what do I need to do to fix it, or is there a detailed explanation for me to understand this code Note: My data annotation format

id confidence center_x center_y center_z width length height yaw class_name

thank you all

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:41 (3 by maintainers)

github_iconTop GitHub Comments

20reactions
jihanyangcommented, Aug 24, 2020

@Gltina Please make sure:

  1. point cloud range along z-axis / voxel_size is 40
  2. point cloud range along x,y -axis / voxel_size is the multiple of 16.
15reactions
MartinHahnercommented, Aug 20, 2020

So, I did not change the value of POINT_CLOUD_RANGE and VOXEL_SIZE, also because I don’t know the exact meaning of these parameters during training worried.

POINT_CLOUD_RANGE defines the range where space should be voxelized or in other words, space which contain points you assume to be relevant and you have annotations for. For KITTI this space is around 40m to the sides, 70m to the front, 3m below and 1m above the sensor, hence [0, -39.68, -3, 69.12, 39.68, 1].

VOXEL_SIZE defines the [length, width, height] of each voxel, since PointPillars uses pillars instead of voxels, the height of a voxel is set to the full height of your point cloud range. For the KITTI frames the default length and width of a voxel is set to 16cm, hence [0.16, 0.16, 4].

I hope this helps.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Custom datasets: Everything you need to know - Help Center
Custom datasets allow you to design your own dataset from scratch, allowing you to analyze data for just about anything! You define the...
Read more >
Fine-tuning with custom datasets - Hugging Face
This tutorial will take you through several examples of using Transformers models with your own datasets. The guide shows one of many valid...
Read more >
How I made my custom dataset for machine learning - Medium
If we were to train on the whole data, we would have a problem that is called overfitting basically the algorithm learns and...
Read more >
Datasets and schemas - Amazon Personalize
The keywords can't be in your data. Domain dataset group datasets have additional requirements based on both domain and dataset type. Custom dataset...
Read more >
Using data attributes - Learn web development | MDN
Custom attributes are also supported in SVG 2; see SVGElement.dataset and data-* for more information. How to use HTML data attributes ( ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found