Inference: own data
See original GitHub issueHi,
I just had a few questions regarding using our own data and running inference using PENet pretrained weights.
-
How sparse can the depth map be? Currently, my inference image is from the Kitti360 dataset which is quite similar to the previous kitti that the network was trained on. But there is no GT depth to sample the depth from. So my sparse depth map is quite sparse. When I run inference on this image, the prediction is also sparse i.e I have prediction only in the regions covered by the sparse depth map. Is this an expected behaviour?
-
What should my input be for ‘positions’ (i.e the cropped image), I don’t want to crop the images for running inference, so should I just set
input['positions'] = input['rgb']
?
It would be great if you can answer these questions when time permits 😃
Regards, Shrisha
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (5 by maintainers)
Top GitHub Comments
You could refer to [An intriguing failing of convolutional neural networks and the CoordConv solution] by Liu for more details about positional encoding. In our default settings, we use the geometric encoding (ie. 3d coordianates) described in our paper. And the evaluation and training process should share consistent settings.
I think two points could be taken into consideration:
The sparse depth maps in KITTI360 seem denser than the ones we use in KITTI depth. This means that there exists domain gaps between those two datasets, leading to the failure of predicted results. We suggest that you could (i) Construct denser GT maps in KITTI360 for further training or finetuning. This step is necessary for transfer learning. Or (ii) Consider depth completion methods with “Sparsity Invariance”, which aims at countering the instability brought by unknown and varying density. You could refer to [Sparsity Invariant CNNs] by Uhrig or [A Normalized Convolutional Neural Network for Guided Sparse Depth Upsampling] from our group. Recently, this topic has been discussed in [Boosting Monocular Depth Estimation with Lightweight 3D Point Fusion] as well.
A secondary reason is that: Not all pixels in the predicted depth map are reliable. You could refer to a previous issue for this.