What is wrong with my model? + summary & solutions to F.A.Q
See original GitHub issueHi everyone!
I was already having a discussions about my issues in issue-68, but decided to open a separate ticket anyway for completeness towards other people. As of now. I am clueless what is wrong with my model. My workflow and solved issues are as follows:
- Using Unity3D, I created a data-set with around 240 training-images and 60 test-images for a custom model: a cube with 4 different colors (for testing).
I came across multiple issues regarding the following:
- I set my camera calibration intrinsic parameters as follows:
K = np.zeros((3, 3), dtype='float64')
K[0, 0], K[0, 2] = 320, 320
K[1, 1], K[1, 2] = 320, 240
K[2, 2] = 1.
Annotated labels are created automatically in Unity3D of which I expect there to be no camera distortion (for the intrinsic camera calibration). The original author’s of the LINEMOD dataset use a Kinect camera that does have such a camera distortion. This internal camera calibration is necessary for among others the PnP-algorithm.
- Initially, I had many problems with creating my .PLY files:
- Scaling them into meters. I think I managed this my .PLY file here:
- Apparently models have to be centered at their volumetric point of mass at the origin (0,0,0). I centered my objects at the bottom-surface, giving them an unwanted offset.
- Bounding-box coordinates need to be in a specific order in their respective annotated label file. If this is not the case, the bounding-box that gets generated form the .PLY file and predictions from singleshotpose will be distorted completely. There are multiple issues on Github about this (among which issue-49
- Correct order of boundingbox coordinates in label file (green border):
- Incorrect order of boundingbox coordinates in label file:
Find here, an example of an image and a label file from my trainings-set.
HOWEVER.
I am still not obtaining correct results and I am unsure about how long to train my models for. In the implementation 700epochs are stated, yet if I train for 4000 epochs, my results are still not good:
How many epochs should a new object be trained for? NOTE: I am using benchvise/init.weights as my initial weights for my new model of the custom data-set. This while my loss function goes down properly, but my accuracy measurements stay at 0%:
Could there still be a problem with how I created the annotation files, camera intrinsic parameters or .PLY model. Or could there be another problem that I am not considering?
@btekin Would it be an idea to add a F.A.Q section to the README, using my findings? I think the section about training on a custom data-set could use a lot more elaboration.
Moreover, I am curious as to what people are doing with singleshotpose. Anyone experimenting with some interesting use-cases?
Many thanks for anyone that can help!
Issue Analytics
- State:
- Created 4 years ago
- Reactions:2
- Comments:19
Top GitHub Comments
Hi @btekin and @MohamadJaber1
Thank you @btekin for looking at my dataset. Coincidentally I managed to gain some results this morning by converting my images (which are .png) to .jpg. I suspected that this could be of influence since .png can have transparent pixels in its images. Moreover I extended my tool with a mask over objects. Every object now has a mask as well. Either one of these changes fixed the problem for me.
@jgcbrouns Thank you for providing your dataset. After inspecting examples from your dataset, I would suggest you to do the two following things and see if they help:
Reduce color data augmentation. This could be useful because your object does not have any texture and the only cue that is useful to predict the pose is color. When you apply color data augmentation (changing hue, saturation values etc.) during training, the model has difficulty in distinguishing between different colors and hence estimating the pose. The current setting for the color data augmentation could be too high for your data. You can change the values for color data augmentation at the following lines: https://github.com/Microsoft/singleshotpose/blob/master/dataset.py#L67:L69
Randomly change the background. This could be useful as you have a small number of training examples and although you use different backgrounds for different training images, having static backgrounds for the same examples might result in overfitting.
Hope these pointers might help your problem. Please let me know how it goes.
@MohamadJaber1 As @jgcbrouns pointed out, I think you would need to fix the bounding box label coordinates in order for the network to start learning. If you provide a sample, I could also take a look at your data.