Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Proper way to do inference in fp16 mode

See original GitHub issue

❓ How to use Detectron2

predictor.model.half(); predictor(image) will throw RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same.

So I have to , in retinanet, for example, change features = self.backbone(images.tensor) to features = self.backbone(images.tensor.half()) manually. This works but it modifies the source code.

Are there any other ways to set the input image type? Tried change image = torch.as_tensor(image.astype("float32").transpose(2, 0, 1)) to image = torch.as_tensor(image.astype("float16").transpose(2, 0, 1)) in DefaultPredictor but didn’t work `

Issue Analytics

State:
Created 4 years ago
Reactions:8
Comments:9

Top GitHub Comments

7reactions

ppwwyyxxcommented, Jan 14, 2020

We convert models to int8 for deployment, but no promise on when it will be open source (maybe in a few months).

1reaction

ArtificialNotImbecilecommented, Apr 28, 2020

I end up using apex for mixed-precision inference. The accuracy is comparable with fp32 mode while utilizing less GPU memory(also a little bit faster).

Top Results From Across the Web

Memory and speed - Hugging Face

We present some techniques and ideas to optimize Diffusers inference for memory or speed. As a general rule, we recommend the use of...

Train With Mixed Precision - NVIDIA Documentation Center

Porting the model to use the FP16 data type where appropriate. Adding loss scaling to preserve small gradient values. The ability to train...

Compressing a Model to FP16 - OpenVINO™ Documentation

Model Optimizer can convert all floating-point weights to FP16 data type. The resulting IR is called compressed FP16 model. ... Using --data_type FP32...

Inference time of fp32 and fp16 roughly the same on RTX3090

half() and tensor.half(), I get 0.0172 ms average inference time. Why is the fp16 model not taking less time to infer?

Hardware for Deep Learning. Part 3: GPU - Intento

AMD Radeon RX Vega has no restrictions for FP16, giving 2x performance compared to FP32, while FP64 is slower (1/16th). INT8 is useful...