normalization_stats inconsistency between dim=2 and dim=3
See original GitHub issueI am very confused by the function normalization_stats in the data_utils.py file.
In particular it seems that the set of keypoints that are kept is not consistent:
# Encodes which 17 (or 14) 2d-3d pairs we are predicting
dimensions_to_ignore = []
if dim == 2:
dimensions_to_use = np.where(np.array([x != '' and x != 'Neck/Nose' for x in H36M_NAMES]))[0]
dimensions_to_use = np.sort( np.hstack( (dimensions_to_use*2, dimensions_to_use*2+1)))
dimensions_to_ignore = np.delete( np.arange(len(H36M_NAMES)*2), dimensions_to_use )
else: # dim == 3
dimensions_to_use = np.where(np.array([x != '' for x in H36M_NAMES]))[0]
dimensions_to_use = np.delete( dimensions_to_use, [0,7,9] if predict_14 else 0 )
dimensions_to_use = np.sort( np.hstack( (dimensions_to_use*3,
dimensions_to_use*3+1,
dimensions_to_use*3+2)))
dimensions_to_ignore = np.delete( np.arange(len(H36M_NAMES)*3), dimensions_to_use )
If dim=3
we remove index 0 (associated to hip) in H36M_NAMES
or indices 0,7,9 (hip, spine, neck/nose) if the predict_14
flag is True
.
However, if dim=2
you are removing the index 9 (associated to neck/nose).
This means that:
-
If the
predict_14
flag isFalse
you are predicting the 3d location of a keypoint for which you don’t have the 2d input (regardless of whether you use SH or GT 2d keypoints). -
If the
predict_14
flag isTrue
you are not predicting the 3d location of keypoints for which you actually have the 2d keypoint (i.e. spine and neck/nose). So why not remove it from the 2d input as well?
Do I understand correctly? Could you elaborate on why you chose this setup?
Thanks a lot!
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (3 by maintainers)
Sorry, I changed subjects after publishing that paper. I do not have any followup experiments to report other than what we showed on the paper.
I’m closing this thread. Please open new issues if you have further questions.
Just had another question pop-up (sorry)… Have you tried training on 2dgt and testing with sh detections or vice-versa, and how does performance drop compared to keeping the input source consistent?