Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

A simple question about camera and coordinate system.

See original GitHub issue

Hi, I have a simple question about ROMP. I have been struggling putting people into their correct relative position, but is it really possible using the root-aligned SMPL meshes without predicting their transl? (And if we have camera param K, will it be possible? )

What is the coordinate system of the vertices that are used for rendering? I think we are predicting camera coordinate system points but root-aligned, correct?
Following Q1, before rendering verts onto image, there is a trans added to verts ('cam_trans in projection.py') What is it? and what is estimate_translation actually doing? Is this estimating root’s position? https://github.com/Arthur151/ROMP/blob/e30b7d17f13089fa9fa114df494192e31b0f43ed/romp/lib/visualization/visualization.py#L61
I tried to replace the verts +trans in Q2 with GT mesh, so verts=GT_verts, without any other changes to your code, but the results are not correct, I expect it to be fully matched the person on the image but there are always shifts, and I also can’t use the same FOV otherwise it would be a very small mesh on the image.

Sorry if I understand anything wrong. I think rendering is the final part I didn’t understand in your code. Looking forward you for your answer!

Zhengdi

Issue Analytics

State:
Created a year ago
Comments:24 (10 by maintainers)

Top GitHub Comments

1reaction

ZhengdiYucommented, Apr 21, 2022

ONNX

@ZhengdiYu , Zhengdi, Please check this function:

https://github.com/Arthur151/ROMP/blob/704a5ea7f0e8e5041782622b5fc305dbed9733c3/romp/lib/utils/projection.py#L39

Camera coordinate system is defined by the proj_mat in this function. Therefore, if you want to get the predicted translation is GT Camera coordinate system, you just need to provide the right proj_mat, which is commonly called extrinsic & intrinsic camera matrix / camera projection matrix. If you understand estimate_translation, you will know it can transform the 3D translation from our pre-defined camera space to the target one, like GT Camera coordinate system you want here.

Thanks! I will look into this, I do have the camera intrinsic.

0reactions

sylyt62commented, Apr 28, 2022

Indeed! I got it, thx~