question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Fbank energies are much greater in Kaldi than torchaudio

See original GitHub issue

Relevant part of discussion in the first feature extraction PR:

As to the large 0th MFCC coefficient, the plot thickens. When I compared the Kaldi and lhotse (torchaudio) output for the same file, but using log-mel energies instead - the output still has the same shape, but Kaldi energies are way larger, see the image:

image

The Kaldi result can be recreated by: steps/make_fbank.sh --fbank-config librifbank.conf --nj 1 libridata fbank_libri fbank_libri

where librifbank.conf has:

--dither=0
--sample-frequency=16000

and the wav.scp in libridata is:

rec1 /path/to/lhotse/test/fixtures/libri-1088-134315-0000.wav

Then, to analyse in Python, I read it in python like:

import kaldi_io
fbank_kaldi = list(kaldi_io.read_mat_scp('/path/to/fbank_libri/raw_fbank_libridata.1.scp'))[0][1]

Paranoid check with copy-feats ark:fbank_libri/raw_fbank_libridata.1.ark ark,t:- yields the same output:

rec1  [
  7.189558 9.037251 10.99918 11.80182 13.34635 14.90742 14.93212 16.49876 16.68227 16.07505 17.88986 17.97858 18.57075 17.19065 16.88927 17.6966 17.28112 17.44429 16.39793 15.73682 15.18271 14.46821 13.25515
  8.695339 10.86374 11.10041 11.28711 12.6426 13.6858 15.10104 14.55601 15.04671 15.78543 17.72975 17.83109 17.31431 16.33819 16.76282 17.25342 17.55506 17.83777 16.99071 15.81893 15.01264 13.81689 13.02168
  8.378333 9.902431 10.18937 10.38635 12.07959 13.96713 15.26997 15.65272 15.58296 15.4958 18.21008 17.68359 17.81999 16.76442 16.32026 17.28751 17.46375 17.26385 16.55256 15.46311 14.84257 13.89831 13.44193
(...)

Any ideas? Anyway, I think we can merge this one - let me know once you review.

_Originally posted by @pzelasko in https://github.com/pzelasko/lhotse/pull/10#issuecomment-631132273_

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
csukuangfjcommented, May 26, 2020

Related code is given below

https://github.com/kaldi-asr/kaldi/blob/797905b0140f7169caf3d97c75a1a56a92f67d38/src/feat/wave-reader.h#L60-L62

https://github.com/pytorch/audio/blob/313f4f5ca65b82ff81e62165ee2328a4c10de013/torchaudio/_sox_backend.py#L51

https://github.com/pytorch/audio/blob/313f4f5ca65b82ff81e62165ee2328a4c10de013/torchaudio/__init__.py#L414-L423


torchaudio/compliance/kaldi.py is not really compliant with kaldi. You have to be careful when using it:

  • the default dither value is 1 in kaldi; but it is 0 in torchaudio. If you set it to 1 in torchaudio, you will be in trouble since the noise amplitude is comparable with the signal amplitude.
0reactions
pzelaskocommented, Jun 9, 2020

Thanks for your insights guys. I will close this issue since everything seems to be OK with the current settings.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Source code for torchaudio.compliance.kaldi - PyTorch
import math from typing import Tuple import torch import torchaudio from ... int: r"""Returns the smallest power of 2 that is greater than...
Read more >
Source code for torchaudio.compliance.kaldi
from typing import Tuple import math import torch import torchaudio from ... int: r"""Returns the smallest power of 2 that is greater than...
Read more >
Feature extraction — lhotse 1.11.0.dev documentation
It results in Kaldi energies being significantly greater than in Lhotse. Also, by default, we turn off dithering for deterministic feature extraction.
Read more >
Speech Processing for Machine Learning: Filter banks, Mel ...
Mel-Frequency Cepstral Coefficients (MFCCs) were very popular features for a long time; but more recently, filter banks are becoming ...
Read more >
Torchaudio 0.3 with Kaldi Compatibility, New Transforms
Torchaudio, a domain library for PyTorch, has been revamped, adding signal processing functionality to make waveform data loading and ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found