Fbank energies are much greater in Kaldi than torchaudio
See original GitHub issueRelevant part of discussion in the first feature extraction PR:
As to the large 0th MFCC coefficient, the plot thickens. When I compared the Kaldi and lhotse (torchaudio) output for the same file, but using log-mel energies instead - the output still has the same shape, but Kaldi energies are way larger, see the image:
The Kaldi result can be recreated by: steps/make_fbank.sh --fbank-config librifbank.conf --nj 1 libridata fbank_libri fbank_libri
where librifbank.conf
has:
--dither=0
--sample-frequency=16000
and the wav.scp
in libridata
is:
rec1 /path/to/lhotse/test/fixtures/libri-1088-134315-0000.wav
Then, to analyse in Python, I read it in python like:
import kaldi_io
fbank_kaldi = list(kaldi_io.read_mat_scp('/path/to/fbank_libri/raw_fbank_libridata.1.scp'))[0][1]
Paranoid check with copy-feats ark:fbank_libri/raw_fbank_libridata.1.ark ark,t:-
yields the same output:
rec1 [
7.189558 9.037251 10.99918 11.80182 13.34635 14.90742 14.93212 16.49876 16.68227 16.07505 17.88986 17.97858 18.57075 17.19065 16.88927 17.6966 17.28112 17.44429 16.39793 15.73682 15.18271 14.46821 13.25515
8.695339 10.86374 11.10041 11.28711 12.6426 13.6858 15.10104 14.55601 15.04671 15.78543 17.72975 17.83109 17.31431 16.33819 16.76282 17.25342 17.55506 17.83777 16.99071 15.81893 15.01264 13.81689 13.02168
8.378333 9.902431 10.18937 10.38635 12.07959 13.96713 15.26997 15.65272 15.58296 15.4958 18.21008 17.68359 17.81999 16.76442 16.32026 17.28751 17.46375 17.26385 16.55256 15.46311 14.84257 13.89831 13.44193
(...)
Any ideas? Anyway, I think we can merge this one - let me know once you review.
_Originally posted by @pzelasko in https://github.com/pzelasko/lhotse/pull/10#issuecomment-631132273_
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (3 by maintainers)
Top GitHub Comments
Related code is given below
https://github.com/kaldi-asr/kaldi/blob/797905b0140f7169caf3d97c75a1a56a92f67d38/src/feat/wave-reader.h#L60-L62
https://github.com/pytorch/audio/blob/313f4f5ca65b82ff81e62165ee2328a4c10de013/torchaudio/_sox_backend.py#L51
https://github.com/pytorch/audio/blob/313f4f5ca65b82ff81e62165ee2328a4c10de013/torchaudio/__init__.py#L414-L423
torchaudio/compliance/kaldi.py
is not really compliant with kaldi. You have to be careful when using it:Thanks for your insights guys. I will close this issue since everything seems to be OK with the current settings.