Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to use GPU during inference?

See original GitHub issue

The example code for inference doesn’t seem to be using GPU. How can I do so?

I’ve tried this but getting a CUDNN_STATUS_NOT_INITIALIZED error:

run_opts = {"device": "cuda","data_parallel_count": -1,"data_parallel_backend": False,"distributed_launch": False,"distributed_backend": "nccl","jit_module_keys": None}

asr_model = EncoderDecoderASR.from_hparams(source="speechbrain/asr-crdnn-rnnlm-librispeech", savedir="pretrained_model", run_opts=run_opts)
```

Issue Analytics

State:
Created 3 years ago
Comments:12 (5 by maintainers)

Top GitHub Comments

4reactions

pplantingacommented, Mar 20, 2021

Try:

asr_model = TransformerASR.from_hparams(
  source="speechbrain/asr-transformer-transformerlm-librispeech",
  savedir="pretrained_models/asr-transformer-transformerlm-librispeech",
  run_opts={"device": "cuda"},
)

edit: as pointed out by @mschonwe changed last } to )

1reaction

alpoktemcommented, Mar 23, 2021

I was wrong trusting torch.cuda.is_available() saying everything is fine. I had upgraded to PyTorch 1.8 as required by speechbrain and it broke my other GPU training setups too. I was getting this error:

`RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling `cublasCreate(handle)`

Reverting back into torch 1.7.1 fixed it for me. I am now able to run both my and @mschonwe’s code using GPU. Thanks for the guidance @pplantinga.

Top Results From Across the Web

Should I use GPU or CPU for inference?

Running inference on a GPU instead of CPU will give you close to the same speedup as it does on training, less a...

GPU Inference - AWS Deep Learning Containers

PyTorch GPU inference · Verify that the nvidia-device-plugin-daemonset is running correctly. · Create the namespace. · (Optional step when using public models.) ...

Why Am I Using GPUs for Deep Learning Inference? - Medium

Many data scientists who are working in these scenarios may start with CPUs, since inference is typically not as resource-heavy as training. As ......

GPU-Accelerated Machine Learning Inference as a Service for ...

This saturates the GPUs, keeping the pipeline of inference requests as full as possible.

Efficient Inference on a Single GPU - Hugging Face

Make sure that you have enough GPU memory to store the quarter (or half if your model weights are in half precision) of...