Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Feature-Request] Add new model(BiSeNetV2) into the model zoo(resources/nn)

See original GitHub issue

Start with the `why`:

Reason 1 I have trained a model which only does road segmentation with BiSeNetV2 network architecture and has a decent FPS(around 15 FPS). I would like to deploy it to my OAK camera.

Above is the demo with OpenVINO IR model format

Reason 2 Compare to the existing notebook for training a segmentation model(e.g. the deep lab v3) which uses out of date TF, the training process using PaddleSeg is much more delightful and much quicker. Within 15 minutes, I got my model trained and thanks to the newer network architecture, BiSeNetV2, I got satisfactory accuracy as well as a pretty good FPS (more than 3 times quicker than the existing road-segmentation-adas-1000 model). If this approach is approved to be useful, then we can quickly and more importantly easily train more custom models with less efforts. Thus it will enrich the model zoo with full of models with the latest architectures which will benefit the community and in turn it makes DepthAI and OAK camera more valuable.

Move to the `what`:

A road segmentation model(BiSeNetV2 arch) has been trained by using PaddleSeg and verified in format of OpenVINO IR model (the .bin, .xml). The current depthai_demo.py doesn’t provide an easy way to do customized normalization before feeding a video frame to the model. I would like the code to be updated to enable the demo code to have an interface/place to pass in the transformations(e.g. in my case, the normalization).

Move to the `how`:

I have update the code(in a quick and dirty way) to apply transformation before feeding the video frames to model, it still looks not quite right by watching the video segmentation.

I have attached my code and trained model below for your convenience and this is where a DepthAI expert is needed to finish the last step.

Before running the code, please install python -m pip install paddlepaddle-gpu pip install paddleseg

infer.py, the code to verify the OpenVINO IR model is correct without defects after exporting from PaddleSeg,
model.bin and model.xml in IR model to be used with the infer.py
Example Video
Updated depthai_demo.py
depthai/resources/nn/road/road.json
depthai/resources/nn/road/handler.py
depthai/resources/nn/road/road.blob

The code base I used is the latest main branch which I pulled this morning. Please run this command: python depthai_demo.py --sync -cnn road -vid ./video.mp4 Please let me know if you need any more information and appreciated for your help and great work~!

Issue Analytics

State:
Created 2 years ago
Comments:20 (9 by maintainers)

Top GitHub Comments

3reactions

szabi-luxoniscommented, Aug 22, 2021

Does OpenVINO support normalization for us? CC: @szabi-luxonis and @PINTO0309.

It does, and it is captured in the documentation here. CC: @Erol444 for future recommendations.

We can add in the future mean and scale values for preview image of ColorCamera node, there is already option to output FP16, but it’s not normalized. Adding normalization/scaling is quite simple. Regardless, the best and easiest way is including preprocessing in the model itself, IMO.

3reactions

szabi-luxoniscommented, Aug 22, 2021

Hi @PINTO0309

Thanks for your suggestion.

The Normalization doesn’t come from PyTorch, it comes from PaddleSeg and here is the code:
def normalize(im, mean, std):
    im = im.astype(np.float32, copy=False) / 255.0
    im -= mean
    im /= std
    return im
As you can see here, there are THREE steps performed during the normalize(), but in your view of model architecture, there are only TWO steps which means it is lack of 1 operation.

Also, the 0.5 is not the range, it is the value for mean and std used when training the model. So if we didn’t use the correct value of mean and std, the model would not return correct predictions. Hopefully, I explained it clearly.

Btw, good tool to view the model structure. May I know the name of this tool?

Thanks

You can rewrite normalization step as:

 def normalize(im, mean, std):
     mean = mean * 255.0
     std = std * 255.0
     im = im.astype(np.float32, copy=False) 
     im -= mean
     im /= std
     return im

Which means mean value 127.5, scale value 127.5.

Top Results From Across the Web

Model Zoo - Deep learning code and pretrained models for ...

ModelZoo curates and provides a platform for deep learning researchers to easily find code and pre-trained models for a variety of platforms and...

Computer Vision Model Zoo — TAO Toolkit 4.0 documentation

Computer Vision Model Zoo; View page source. Computer Vision Model Zoo¶. PeopleNet · Training algorithm · Intended use case · PeopleNet Transformer.

Release Notes for Intel® Distribution of OpenVINO™ toolkit ...

This new release empowers developers with new performance enhancements, more deep learning models, more device portability, and higher inferencing performance ...

BiseNetv2-Tensorflow - GitHub Pages

Use tensorflow to implement a real-time scene image segmentation model based on ... Once you have prepared the dataset's image well you may...

Welcoming New Animals to the Zoo — Model Evaluation

New TF2 OD API introduces eager execution that makes debugging of the object detection models much easier; it also includes new SOTA models...