question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Trouble training custom dataset

See original GitHub issue

Training Detectron on custom dataset

I’m trying to train Mask RCNN on my custom dataset to perform segmentation task on new classes that coco or ImageNet never seen.

  • I first converted my dataset to coco format so it can be loaded by pycocotools.
  • I added my dataset path into dataset_catalog.py and created the correct link to images directory and annotations path. The config file I used is based on configs/getting_started/tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml . My dataset contains only 4 classes without background so I set NUM_CLASSES to 5 ( 4 does not work either). When I try to train using the command bellow : python2 tools/train_net.py --cfg configs/encov/copy_maskrcnn_R-101-FPN.yaml OUTPUT_DIR /tmp/detectron-output/

ERROR 1:

I get the following error (complete log file is here output.txt) At: /home/encov/Softwares/Detectron/lib/roi_data/fast_rcnn.py(269): _expand_bbox_targets /home/encov/Softwares/Detectron/lib/roi_data/fast_rcnn.py(181): _sample_rois /home/encov/Softwares/Detectron/lib/roi_data/fast_rcnn.py(112): add_fast_rcnn_blobs /home/encov/Softwares/Detectron/lib/ops/collect_and_distribute_fpn_rpn_proposals.py(62): forward terminate called after throwing an instance of 'caffe2::EnforceNotMet' what(): [enforce fail at pybind_state.h:423] . Exception encountered running PythonOp function: ValueError: could not broadcast input array from shape (4) into shape (0)

This error comes from the expand box procedure that increase the size of bounding box weights by 4 (see roi_data/fast_rcnn.py). It basically takes the first element which represents the class, checks that it is not 0 (the background) and copy weights values at index_class x 4. Error happens because the index is greater than the NUM_CLASSES parameter which has been used to create the output array.


ERROR 2

I try same training except I set NUM_CLASSES to 81 which was the number of classes used for coco training which is working on my set-up by the way. The error I described above does not appear but in the really early beginning of the the iterations, bounding box areas is null which cause some divisions by zero. output2.txt

Has someone experienced the same issue for training fast rcnn or mask rcnn on a custom dataset ? I really suspect an error in my json coco-like file because training on coco dataset in working correctly. Thank you for your help,

System information

  • Operating system: Ubuntu 16.04
  • Compiler version: GCC 5.4.0
  • CUDA version: 8.0
  • cuDNN version: 7.0
  • NVIDIA driver version: 384
  • GPU model: GeForce GTX 1080 (x1)
  • python --version output: Python 2.7.12

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:30

github_iconTop GitHub Comments

5reactions
francotocommented, Mar 7, 2018

I finally made it:

  • first, the bounding box coordinates in my dataset were wrong. I realize my mistakes when I tried to visualize them using pycocotools API (which by default doesn’t have a specific method to show them by the way).
  • Finally, I misunderstood the part where I need a ‘background’ class (for labelling every pixel not in other classes) so I add one in my dataset but actually json_datatset.py is creating its own one. Delete my ‘background’ label in my dataset allows me to finally start the training.
4reactions
realwecancommented, Feb 22, 2018

How many classes do you have in your custom dataset? If you have N classes, then you should set NUM_CLASSES: N+1 in your yaml config file. For example, for six classes you should set NUM_CLASSES: 7. For 80 classes COCO you should set it to 81.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Step-by-step instructions for training YOLOv7 on a Custom ...
Follow this guide to get step-by-step instructions for running YOLOv7 model training within a Gradient Notebook on a custom dataset.
Read more >
How to Train YOLOv7 on a Custom Dataset - Roboflow Blog
Training the Yolov7 with Custom Data ... After pasting the dataset download snippet into your YOLOv7 Colab notebook, you are ready to begin...
Read more >
Training YOLOv5 custom dataset with ease - Medium
In this story, we talk about the YOLOv5 models training using custom datasets through a case study using the Labeled Mask dataset.
Read more >
Custom training: walkthrough | TensorFlow Core
This tutorial shows you how to train a machine learning model with a custom training loop to categorize penguins by species. In this...
Read more >
Training CNN from Scratch Using the Custom Dataset
Downloading the dataset from the website, then preparing the training, validation, and testing set using python3. · Building own network (design ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found