Inconsistent model inference
See original GitHub issueDescribe the bug
Outputs different predictions for the same input depending batch. I suspect the problem is related to pad tokens not being masked properly.
Steps/Code to reproduce bug
import nemo.collections.asr as nemo_asr
model = nemo_asr.models.ASRModel.from_pretrained(model_name="stt_en_conformer_ctc_large")
model.transcribe(audio_paths[:2], batch_size=1)
# output
# ['take care of over excitement and endeavor to keep a quiet mind even for your health it is the best advice that can be given to you # # your moral and spiritual improvement will then keep pace with the culture of your intellectual powers',
# 'ab crowd']
model.transcribe(audio_paths[:2], batch_size=2)
# output
# ['take care of over excitement and endeavor to keep a quiet mind even for your health it is the best advice that can be given to you # # your moral and spiritual improvement will then keep pace with the culture of your intellectual powers',
# 'crow']
# the provided example occurred with audio files from the libritts100 dataset namely (3575_170457_000032_000001.wav and 6829_68771_000042_000001.wav), but I am quite sure it can be reproduced with other audio files as well.
Expected behavior
Outputs should be identical independent of batch size or items in batch.
Environment overview
Linux Ubuntu 20.04 Python version: 3.8 PyTorch version: 1.10.0+cu113 Nemo version: 1.4.0 CUDA/cuDNN version: 11 GPU model and memory: RTX 3070, Tesla V100-SXM2-16GB
Issue Analytics
- State:
- Created 2 years ago
- Comments:5
Top Results From Across the Web
Inconsistent model inference time #3123 - NVIDIA/NeMo
Recently, I want to measure the inference time, I design two cases, First I read model input from files, Second, each model input...
Read more >Inconsistent model inference Time
I am a graduate student working on a project about Model inference time optimization. I am using this very simple code snippet to...
Read more >Inconsistent results when performing inference in CPU vs ...
I have been testing tensorflow-metal in my 2020 Macbook Pro (M1) running macOS 12.0.1 by performing the inference of a pre-trained model on...
Read more >Inconsistency of Bayesian Inference for Misspecified Linear ...
Abstract. We empirically show that Bayesian inference can be inconsistent under misspecification in simple linear regression problems, both in a model averaging ...
Read more >Inconsistency of Bayesian inference when the model is wrong ...
Inconsistency of Bayesian inference when the model is wrong, and how to ... Big idea: Use probability distributions over the models and.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
It would take two pases over the audio to measure just the durations then sort and infer. This doesn’t scale for large datasets or multi node inference (parallel transcribe script).
We use batch size 1 for academic settings - when publishing numbers for a paper cause it’s a one off anyway. For all other settings we use batch size 32 or more (for RNNTs as high as 256) and accept a 0.1% loss of WER.
I measured on 43k audios of libritts100 dataset, it took 3 minutes with 8 processes, which might be preferred to error coming from padding. Anyway, it’s just a solution that came to my mind.