Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Inconsistent model inference

See original GitHub issue

Describe the bug

Outputs different predictions for the same input depending batch. I suspect the problem is related to pad tokens not being masked properly.

Steps/Code to reproduce bug

import nemo.collections.asr as nemo_asr

model = nemo_asr.models.ASRModel.from_pretrained(model_name="stt_en_conformer_ctc_large")
model.transcribe(audio_paths[:2], batch_size=1)

# output
# ['take care of over excitement and endeavor to keep a quiet mind even for your health it is the best advice that can be given to you # # your moral and spiritual improvement will then keep pace with the culture of your intellectual powers',
# 'ab crowd']

model.transcribe(audio_paths[:2], batch_size=2)
# output
# ['take care of over excitement and endeavor to keep a quiet mind even for your health it is the best advice that can be given to you # # your moral and spiritual improvement will then keep pace with the culture of your intellectual powers',
# 'crow']

# the provided example occurred with audio files from the libritts100 dataset namely (3575_170457_000032_000001.wav and 6829_68771_000042_000001.wav), but I am quite sure it can be reproduced with other audio files as well.

Expected behavior

Outputs should be identical independent of batch size or items in batch.

Environment overview

Linux Ubuntu 20.04 Python version: 3.8 PyTorch version: 1.10.0+cu113 Nemo version: 1.4.0 CUDA/cuDNN version: 11 GPU model and memory: RTX 3070, Tesla V100-SXM2-16GB

Issue Analytics

State:
Created 2 years ago
Comments:5

Top GitHub Comments

1reaction

titu1994commented, Nov 22, 2021

It would take two pases over the audio to measure just the durations then sort and infer. This doesn’t scale for large datasets or multi node inference (parallel transcribe script).

We use batch size 1 for academic settings - when publishing numbers for a paper cause it’s a one off anyway. For all other settings we use batch size 32 or more (for RNNTs as high as 256) and accept a 0.1% loss of WER.

0reactions

sipan17commented, Nov 22, 2021

I measured on 43k audios of libritts100 dataset, it took 3 minutes with 8 processes, which might be preferred to error coming from padding. Anyway, it’s just a solution that came to my mind.

Top Results From Across the Web

Inconsistent model inference time #3123 - NVIDIA/NeMo

Recently, I want to measure the inference time, I design two cases, First I read model input from files, Second, each model input...

Inconsistent model inference Time

I am a graduate student working on a project about Model inference time optimization. I am using this very simple code snippet to...

Inconsistent results when performing inference in CPU vs ...

I have been testing tensorflow-metal in my 2020 Macbook Pro (M1) running macOS 12.0.1 by performing the inference of a pre-trained model on...

Inconsistency of Bayesian Inference for Misspecified Linear ...

Abstract. We empirically show that Bayesian inference can be inconsistent under misspecification in simple linear regression problems, both in a model averaging ...

Inconsistency of Bayesian inference when the model is wrong ...

Inconsistency of Bayesian inference when the model is wrong, and how to ... Big idea: Use probability distributions over the models and.