Proper way to do inference in fp16 mode
See original GitHub issue❓ How to use Detectron2
predictor.model.half(); predictor(image)
will throw RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same.
So I have to , in retinanet, for example, change features = self.backbone(images.tensor)
to features = self.backbone(images.tensor.half())
manually. This works but it modifies the source code.
Are there any other ways to set the input image type? Tried change image = torch.as_tensor(image.astype("float32").transpose(2, 0, 1))
to image = torch.as_tensor(image.astype("float16").transpose(2, 0, 1))
in DefaultPredictor
but didn’t work
`
Issue Analytics
- State:
- Created 4 years ago
- Reactions:8
- Comments:9
Top Results From Across the Web
Memory and speed - Hugging Face
We present some techniques and ideas to optimize Diffusers inference for memory or speed. As a general rule, we recommend the use of...
Read more >Train With Mixed Precision - NVIDIA Documentation Center
Porting the model to use the FP16 data type where appropriate. Adding loss scaling to preserve small gradient values. The ability to train...
Read more >Compressing a Model to FP16 - OpenVINO™ Documentation
Model Optimizer can convert all floating-point weights to FP16 data type. The resulting IR is called compressed FP16 model. ... Using --data_type FP32...
Read more >Inference time of fp32 and fp16 roughly the same on RTX3090
half() and tensor.half(), I get 0.0172 ms average inference time. Why is the fp16 model not taking less time to infer?
Read more >Hardware for Deep Learning. Part 3: GPU - Intento
AMD Radeon RX Vega has no restrictions for FP16, giving 2x performance compared to FP32, while FP64 is slower (1/16th). INT8 is useful...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
We convert models to int8 for deployment, but no promise on when it will be open source (maybe in a few months).
I end up using apex for mixed-precision inference. The accuracy is comparable with fp32 mode while utilizing less GPU memory(also a little bit faster).