question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Proper way to do inference in fp16 mode

See original GitHub issue

❓ How to use Detectron2

predictor.model.half(); predictor(image) will throw RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same.

So I have to , in retinanet, for example, change features = self.backbone(images.tensor) to features = self.backbone(images.tensor.half()) manually. This works but it modifies the source code.

Are there any other ways to set the input image type? Tried change image = torch.as_tensor(image.astype("float32").transpose(2, 0, 1)) to image = torch.as_tensor(image.astype("float16").transpose(2, 0, 1)) in DefaultPredictor but didn’t work `

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:8
  • Comments:9

github_iconTop GitHub Comments

7reactions
ppwwyyxxcommented, Jan 14, 2020

We convert models to int8 for deployment, but no promise on when it will be open source (maybe in a few months).

1reaction
ArtificialNotImbecilecommented, Apr 28, 2020

I end up using apex for mixed-precision inference. The accuracy is comparable with fp32 mode while utilizing less GPU memory(also a little bit faster).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Memory and speed - Hugging Face
We present some techniques and ideas to optimize Diffusers inference for memory or speed. As a general rule, we recommend the use of...
Read more >
Train With Mixed Precision - NVIDIA Documentation Center
Porting the model to use the FP16 data type where appropriate. Adding loss scaling to preserve small gradient values. The ability to train...
Read more >
Compressing a Model to FP16 - OpenVINO™ Documentation
Model Optimizer can convert all floating-point weights to FP16 data type. The resulting IR is called compressed FP16 model. ... Using --data_type FP32...
Read more >
Inference time of fp32 and fp16 roughly the same on RTX3090
half() and tensor.half(), I get 0.0172 ms average inference time. Why is the fp16 model not taking less time to infer?
Read more >
Hardware for Deep Learning. Part 3: GPU - Intento
AMD Radeon RX Vega has no restrictions for FP16, giving 2x performance compared to FP32, while FP64 is slower (1/16th). INT8 is useful...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found