Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Neural-Assisted Disparity Depth Estimation

See original GitHub issue

Start with the `why`:

The why of this effort (and initial research) is that any many applications depth cameras (and even sometimes LIDAR) are not sufficient to successfully detect objects in varied conditions. Specifically, for Luxonis’ potential customers, this is directly limiting their business success:

Autonomous wheelchairs. The functionality above it would be HUGE for this application as existing solutions are struggling with the output of D435 depth. It gets tricked too easily and misses objects even w/ aggressive host-side filtering and other detection techniques.
Autonomous lawn mowing. This use-case is also struggling with object detection using D435. The system can’t identify soccer-ball sized things reliably even with significant host-side post-processing and then need to be able to identify down to baseball sized things.
Volumetric estimation of low-visual-interest objects. Disparity depth struggles significantly with objects (particularly large objects) of low visual interest as it lacks features to match. Neural networks can leverage latent information from training that overcomes this limitation - allowing volumetric estimation where traditional algorithmic-based disparity-depth solutions cannot adequately perform.

The original idea of DepthAI is to not solve this sort of problem, but it is well suited to solving it.

Background:

As of now, the core use of DepthAI is to run 2D Object Detectors (e.g. MobileNetSSDv2) and fuse them with stereo depth to be able to get real-time 3D position of objects that the neural network identifies. See here for it finding my son’s XYZ position for example. This solution is not applicable to the above two customers because the type of object must be known to the neural network. Their needs are to avoid any object, not just known ones, and specifically objects which are hard to pick up, which are lost/missed by traditional stereo depth vision.

New Modality of Use

So one idea we had recently was to leverage the neural compute engines (and SHAVES) of the Myriad X to make better depth - so that such difficult objects which traditional stereo depth misses - could be detected with the depth that’s improved by the neural network.

Implementing this capability, the capability to run neural inference to produce the depth map directly, or to improve the results of the disparity-produced depth map, is hugely enabling for the use-cases mentioned above, and likely many others.

Move to the `how`:

The majority of the work of how to make this happen will be in researching what research has been done, and what techniques are sufficiently light-weight to be run on DepthAI directly. Below is some initial research to that end:

Google Mannequin Challenge:

Blog Explaining it: https://ai.googleblog.com/2019/05/moving-camera-moving-people-deep.html Dataset: https://google.github.io/mannequinchallenge/www/index.html Github: https://github.com/google/mannequinchallenge Notice in a lot of caes this is actually quite good looking depth just from a single camera. Imagine how amazing it could look with 2 or 3 cameras.

Could produce just insanely good depth maps.

KITTI DataSet:

http://www.cvlibs.net/datasets/kitti/eval_scene_flow.php?benchmark=stereo

So check this out. A whole bunch of ground truth data, with calibration pictures, etc. So this could be used to train a neural network for sure on this sort of processing.

And then there’s a leaderboard downbelow of those who have.

PapersWithCode:

PapersWithCode is generally awesome. They have a slack even.

https://paperswithcode.com/task/stereo-depth-estimation

Others and Random Notes:

So have a dig through there. This one from there seems pretty neat: https://github.com/CVLAB-Unibo/Real-time-self-adaptive-deep-stereo

These guys seem like they’re getting decent results too: https://arxiv.org/pdf/1803.09719v3.pdf

So on a lot of these it’s a matter of figuring out which ones are light enough weight and so on to see about porting.

Notice this one uses KITTI dataset as well: https://www.cs.toronto.edu/~urtasun/publications/luo_etal_cvpr16.pdf

From Intel R&D directly: https://arxiv.org/pdf/2001.04552.pdf Apparently this was never implemented. Deep Learning Stereo Vision at the edge
Google’s StereoNet looks really fast/lightweight: https://arxiv.org/pdf/1807.08865.pdf
Github summarizing depth quality enhancements using CNNs: https://github.com/mdcnn/Depth-Image-Quality-Enhancement
This one looks pretty interesting: https://arxiv.org/pdf/1910.00541.pdf

SparseNN depth completion https://www.youtube.com/watch?v=rN6D3QmMNuU&feature=youtu.be

ROXANNE Consistent video depth estimation https://roxanneluo.github.io/Consistent-Video-Depth-Estimation/

https://web.stanford.edu/class/ee368/Project_Autumn_1516/Reports/Jordan_Shridhar.pdf Seems like the Myriad X 2x NCE + SHAVES are plenty fast enough to real-time make a super-great disparity depth output.
https://arxiv.org/pdf/1910.13708.pdf
DDRNet: Depth Map Denoising and Refinement for Consumer Depth Cameras Using Cascaded CNNs:
- http://openaccess.thecvf.com/content_ECCV_2018/papers/Shi_Yan_DDRNet_Depth_Map_ECCV_2018_paper.pdf
- https://github.com/neycyanshi/DDRNet
AMNet: Deep Atrous Multiscale Stereo Disparity Estimation Networks: https://arxiv.org/pdf/1904.09099.pdf
Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches: https://github.com/jzbontar/mc-cnn/blob/master/README.md
Siamese network. Probably way too big ass it shows multi-second run-times: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6472548/
The middlebury stereo dataset seems incredibly useful https://github.com/kelkelcheng/GC-Net-Tensorflow/blob/master/README.md
DispNetC shows 0.06 runtime, which is encouraging.
Real-time self-adaptive deep stereo
https://zpascal.net/cvpr2019/Tonioni_Real-Time_Self-Adaptive_Deep_Stereo_CVPR_2019_paper.pdf
https://github.com/CVLAB-Unibo/Real-time-self-adaptive-deep-stereo/blob/master/README.MD
Pytorch implementation of the several Deep Stereo Matching Network(DSMnet) https://github.com/wyf2017/DSMnet/blob/master/README.md

Issue Analytics

State:
Created 3 years ago
Reactions:19
Comments:59 (12 by maintainers)

Top GitHub Comments

5reactions

PINTO0309commented, Sep 14, 2021

Due to a problem with OpenVINO’s conversion to Myriad Blob, I submitted an issue to Intel’s engineers (OpenVINO). So far, Intel engineers seem to be concerned that the structure of the model is wrong, but we are able to infer it successfully in ONNX runtime and TFLite runtime.

[Bug] GatherND shape conversion from ONNX is inaccurate #7379 (HITNET to blob / OpenVINO) https://github.com/openvinotoolkit/openvino/issues/7379

5reactions

nickjrzcommented, Dec 13, 2021

I was able to run real-time inference on HITNET Stereo depth estimation (middlebury) using OAK-D and having the inference on the host. Here are my results:

output_ful

Top Results From Across the Web

Anytime Stereo Image Depth Estimation on Mobile Devices

Abstract—Many applications of stereo depth estimation in robotics require the generation of accurate disparity maps in.

Disparity Estimation Using Deep Learning | LearnOpenCV #

This technique computes correspondence between the pixels of the left and right image, by comparing the pixel-neighborhood information for both ...

Intel Cancelling its Realsense business: Alternatives? - General

Today it was let out that Intel is closing up shop in supporting robotics sensing with the Realsense camera. Sources Say goodbye to...

A stacked and siamese disparity estimation network for depth ...

StaSiS-Net: A stacked and siamese disparity estimation network for depth reconstruction in modern 3D laparoscopy. Med Image Anal. 2022 Apr;77:102380. doi: ...

Depth Estimation for Light-Field Images Using Stereo ... - MDPI

In the second stage, a novel pixel-wise deep-learning (DL)-based method for residual error prediction is employed to further refine the disparity estimation ......

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Neural-Assisted Disparity Depth Estimation