Trouble training custom dataset
See original GitHub issueTraining Detectron on custom dataset
I’m trying to train Mask RCNN on my custom dataset to perform segmentation task on new classes that coco or ImageNet never seen.
- I first converted my dataset to coco format so it can be loaded by pycocotools.
- I added my dataset path into dataset_catalog.py and created the correct link to images directory and annotations path.
The config file I used is based on configs/getting_started/tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml . My dataset contains only 4 classes without background so I set NUM_CLASSES to 5 ( 4 does not work either). When I try to train using the command bellow :
python2 tools/train_net.py --cfg configs/encov/copy_maskrcnn_R-101-FPN.yaml OUTPUT_DIR /tmp/detectron-output/
ERROR 1:
I get the following error (complete log file is here output.txt)
At: /home/encov/Softwares/Detectron/lib/roi_data/fast_rcnn.py(269): _expand_bbox_targets /home/encov/Softwares/Detectron/lib/roi_data/fast_rcnn.py(181): _sample_rois /home/encov/Softwares/Detectron/lib/roi_data/fast_rcnn.py(112): add_fast_rcnn_blobs /home/encov/Softwares/Detectron/lib/ops/collect_and_distribute_fpn_rpn_proposals.py(62): forward terminate called after throwing an instance of 'caffe2::EnforceNotMet' what(): [enforce fail at pybind_state.h:423] . Exception encountered running PythonOp function: ValueError: could not broadcast input array from shape (4) into shape (0)
This error comes from the expand box procedure that increase the size of bounding box weights by 4 (see roi_data/fast_rcnn.py). It basically takes the first element which represents the class, checks that it is not 0 (the background) and copy weights values at index_class x 4. Error happens because the index is greater than the NUM_CLASSES parameter which has been used to create the output array.
ERROR 2
I try same training except I set NUM_CLASSES to 81 which was the number of classes used for coco training which is working on my set-up by the way. The error I described above does not appear but in the really early beginning of the the iterations, bounding box areas is null which cause some divisions by zero. output2.txt
Has someone experienced the same issue for training fast rcnn or mask rcnn on a custom dataset ? I really suspect an error in my json coco-like file because training on coco dataset in working correctly. Thank you for your help,
System information
- Operating system: Ubuntu 16.04
- Compiler version: GCC 5.4.0
- CUDA version: 8.0
- cuDNN version: 7.0
- NVIDIA driver version: 384
- GPU model: GeForce GTX 1080 (x1)
python --version
output: Python 2.7.12
Issue Analytics
- State:
- Created 6 years ago
- Comments:30
Top GitHub Comments
I finally made it:
How many classes do you have in your custom dataset? If you have N classes, then you should set NUM_CLASSES: N+1 in your yaml config file. For example, for six classes you should set NUM_CLASSES: 7. For 80 classes COCO you should set it to 81.