Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Problems with Kaldi MFCCs

See original GitHub issue

Hi, thank you very much for this very useful project.

I started doing some speech recognition experiments with the MFCC features implemented in torchaudio. In particular, I tried the librosa ones implemented in torchaudio/transorms.py and the kaldi-ones implemented in torchaudio/compliance/kaldi.py.

The librosa features are computed very efficiently and I can achieve results similar to that of the original kaldi features when changing some hyperparameters (i.e, n_mfcc=13, hop_length=160,n_mels=23,f_min=80,f_max=7900).
When switching to the kaldi implemented features, however, my neural network doesn’t even converge. I suspect there a bug somewhere. I tried to compare the original kaldi mfccs with the ones implemented in torchaudio and they look very different (dithering only cannot explain such a big difference):

mfcc_original
array([35.84189 , 39.748493, 35.40782 , 33.237488, 34.53969 , 35.40782 ,
       34.973755, 35.40782 , 35.40782 , 35.84189 ], dtype=float32)

mfcc_torch
tensor([29.3794, 29.1657, 28.7020, 27.4892, 29.1944, 27.8915, 29.3321, 28.8958, 28.4197,
29.0967])

The other issue is that the current version doesn’t support cuda and it can only process up to two-channels at a time. Also, the current version is significantly slower than the librosa implementation (there could be a bottleneck somewhere).

Any idea? Hope my feedback would be helpful

Thank you

Mirco

Issue Analytics

State:
Created 4 years ago
Reactions:2
Comments:7 (2 by maintainers)

Top GitHub Comments

3reactions

pablomainarcommented, Feb 13, 2020

Update: I have gone back to the spectrogram level trying to find the bug. If I set the flag subtract_mean to true in both kaldi and pytorch, the resulting spectrogram is (almost) the same. But if I set is as false (which is default), the results are different: they have the same pattern but the mean is different.

Kaldi code to generate spectrograms with mean subtraction: ~/kaldi/src/featbin/compute-spectrogram-feats --subtract-mean=true --dither=0.0 --energy-floor=1.0 scp,p:wav.scp ark:generated_feats/spec.ark

Pytorch code to generate spectrograms with mean subtraction: torch_feats = torchaudio_local.spectrogram(waveform=audio_tensor,dither=0,energy_floor=1.0,subtract_mean=True)

Result: kaldi_feats torch_feats

Kaldi code to generate spectrograms without mean subtraction: ~/kaldi/src/featbin/compute-spectrogram-feats --subtract-mean=false --dither=0.0 --energy-floor=1.0 scp,p:wav.scp ark:generated_feats/spec.ark

Pytorch code to generate spectrograms with mean subtraction: torch_feats = torchaudio_local.spectrogram(waveform=audio_tensor,dither=0,energy_floor=1.0,subtract_mean=False)

Result: kaldi_feats_nonsub torch_feats_nonsub

I suspect that there is something on the FFT computation that is normalizing in one but not in other. Any thoughts?

3reactions

HsunGongcommented, Dec 9, 2019

code I could take a look at ?

Here is my example of 3 and 4:

import torchaudio
import torch,numpy,random
random.seed(0)
numpy.random.seed(0)
torch.manual_seed(0)
torch.cuda.manual_seed(0)
torch.cuda.manual_seed_all(0)

# compute-mfcc-feats --verbose=2 --sample-frequency=8000  scp:data/wav.scp ark:- | copy-feats ark:- ark,scp:data/feats.ark,data/feats.scp
d = { u:d for u,d in torchaudio.kaldi_io.read_mat_scp('data/feats.scp') }
kaldi_feats = d['iaaa']
print(kaldi_feats, kaldi_feats.shape)

wav, rate = torchaudio.load('data/wav/iaaa.wav')
print(wav.shape, rate)

torch_feats = torchaudio.compliance.kaldi.mfcc(wav, sample_frequency=rate)
print(torch_feats, torch_feats.shape)
torch_feats = torchaudio.compliance.kaldi.mfcc(wav, sample_frequency=rate)
print(torch_feats, torch_feats.shape)

tensor([[ 18.3451, -14.9193, -18.3694,  ...,  -6.3691,   1.8752,  -8.8333],
        [ 20.3241, -11.0107, -16.5517,  ..., -10.2303,  -2.2465, -13.0228],
        [ 22.7282,   7.1452, -32.8558,  ..., -14.6897, -22.6369,  -8.8484],
        ...,
        [ 15.3191, -18.4647,   3.6274,  ..., -20.1052,   7.0780,  -6.4834],
        [ 15.3900, -19.9616,  -5.4611,  ..., -12.0642,   4.8870, -15.5243],
        [ 14.6114, -23.1458,  -7.1615,  ..., -31.9867,  -8.1553,  -8.3250]]) torch.Size([7249, 13])
torch.Size([1, 580080]) 8000
tensor([[ 25.4531, -28.9004,  -9.2195,  ...,   9.3991,   6.6678,  -0.3100],
        [ 23.8064, -26.9535,  -8.3300,  ...,  -8.8944,  -4.4637,   8.1744],
        [ 25.2671, -25.2465,  -9.5173,  ...,   1.7179,   5.4729,  -7.5934],
        ...,
        [ 23.7336, -30.1332,  -8.1190,  ..., -13.0223,  -1.7747,   5.4382],
        [ 24.7677, -29.2519,  -9.6620,  ...,  -0.6424,  -4.6334,  -8.3185],
        [ 23.9241, -31.0664,  -8.8748,  ...,  -3.3450,   2.4832,   3.8635]]) torch.Size([7249, 13])
tensor([[ 24.8688, -29.7277, -10.0829,  ...,   0.2335,  -5.1891, -10.5182],
        [ 23.7165, -29.3053, -11.0154,  ...,   8.8459,   4.9695,   1.7033],
        [ 24.3918, -30.2426, -17.2043,  ...,  -1.0753,  -3.7638,   1.8900],
        ...,
        [ 23.8795, -28.2105, -13.3643,  ...,  -0.4222,  -6.8063,   3.2779],
        [ 25.2789, -27.4087,  -4.5631,  ...,  -3.4745,   8.7959,   4.0152],
        [ 25.1426, -30.7162, -10.8394,  ..., -19.6604,  -1.2420,   2.3714]]) torch.Size([7249, 13])

Kaldi are tensor 1 torch.kaldi are tensor 2 torch.kaldi again are tensor 3

All different

Top Results From Across the Web

Problems with MFCC extraction using online2-wav-dump ...

to kaldi-help. Hi all,. I was following the recipe in egs/rm/s5/local/online/run_nnet2_wsj_joint.sh and I encountered a problem when extracting egs.

compute-mfcc-feats - problem with low sample frequency

Hello,. I am a student testing out signal classification using kaldi on a data set with a much slower sampling rate than audio...

Frequently Asked Questions - Kaldi ASR

copy-feats ark:data/raw_mfcc.ark ark,t:data/mfcc.txt # copy binary feature archive to text archive format ... Problem when do alignment.

torchaudio.compliance.kaldi - PyTorch

Create a mfcc from a raw audio signal. This matches the input/output of Kaldi's compute-mfcc-feats. Parameters. waveform (Tensor) – Tensor of audio of...

How to resolve this Kaldi ASR MFCC feature Extraction

I am facing some issue related to Kaldi Feature extraction. I am new to Kaldi, please help me out. OS: Ubuntu 18.04 I...