Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Which library is torchaudio consistent with?

See original GitHub issue

Hi, I’m currently updating my torch codebase from using librosa to torchaudio for transforms, to take advantage of the (much) faster stft torch implementation on the GPU. However, running into several occasions where the output from Spectrogram vs. librosa.core._spectrogram, MelSpectrogram vs. librosa.melspectrogram have different results. Does this repo ensure consistency with another python audio library for those transformations? I think it would be good to have consistency with another widely used library. Currently figuring out the correct params to ensure consistency and I can PR something if that sounds useful.

For example:

sound, sample_rate = torchaudio.load('wav_file.wav')
sound = sound
sound_librosa = sound.cpu().numpy().squeeze().T

sample_rate = 16000
n_mels = 40
window_stride = 0.01
window_size = 0.025
hop_length = int(sample_rate * window_stride)
n_fft = int(sample_rate * window_size)

stft_librosa = librosa.stft(y=sound_librosa,
                            hop_length=hop_length,
                            n_fft=n_fft)
spectro_librosa, n_fft = librosa.core.spectrum._spectrogram(y=sound_librosa,
                            hop_length=hop_length,
                            n_fft=n_fft, power=2)
mel_basis = librosa.filters.mel(sample_rate,
                                n_mels=n_mels,
                                n_fft=n_fft,
                                norm=None, # non-standard
                                htk=True) # non-standard
check = np.dot(mel_basis, spectro_librosa)

stft_torch = torch.stft(soundcuda,
                        hop_length=hop_length,
                        n_fft=n_fft,
                        window=window).transpose(1, 2)
spectro_torch = stft_torch.pow(2).sum(-1)
melscale = torchaudio.transforms.MelScale(n_mels=n_mels)
check2 = melscale(check)

#check == check2

The torchaudio MelScale uses the non-default librosa options norm=None, htk=True on librosa.filters.mel (https://librosa.github.io/librosa/_modules/librosa/filters.html#mel). I also removed the default spectrogram normalization at https://github.com/pytorch/audio/blob/master/torchaudio/transforms.py#L198, which is not a librosa option.

There’s also functional inconsistencies between the librosa and torchaudio function calls – librosa returns a spectrogram with librosa.feature.melspectrogram, whereas torchaudio converts the spectrogram to the Db scale.

Issue Analytics

State:
Created 5 years ago
Reactions:2
Comments:6 (6 by maintainers)

Top GitHub Comments

1reaction

PCerlescommented, Feb 10, 2019

Thanks! The torch.stft implementation is consistent with librosa defaults if you use the same window type, I tested that first.

window = torch.hann_window(n_fft)
torch.stft(sound, hop_length = hop_length, n_fft=n_fft, window=window)
==
librosa.stft(sound, hop_length, n_fft)

I’m interested in the transforms.py function signatures being consistent and any transforms being visible/configurable (e.g. the spectrogram normalization), but didn’t know if these functions were modeled off some library. Personally, I think it makes sense to mirror librosa functionality for functions like MelSpectrogram, since librosa seems to be the most popular python audio library. What do you think? I’m sure the current torchaudio default implementations also make sense for some applications. Happy to contribute if we want to make torchaudio consistent with librosa output for those common speech-to-text transforms

0reactions

vincentqbcommented, Aug 26, 2019

Closing this issue, since PR got merged. Please feel free to re-open 😃

Top Results From Across the Web

Torchaudio 0.13.1 documentation - PyTorch

Torchaudio is a library for audio and signal processing with PyTorch. It provides I/O, signal and data processing functions, datasets, model implementations ...

torchaudio - PyPI

An audio package for PyTorch.

Torchaudio: An Audio Library for PyTorch - Morioh

Torchaudio: an audio library for PyTorch. Data manipulation and transformation for audio signal processing, powered by PyTorch. A machine learning library ...

torchaudio.datasets.ljspeech — Torchaudio 0.7.0 documentation

Tools & Libraries ... Source code for torchaudio.datasets.ljspeech. import os import csv from typing import List, Tuple import torchaudio from ...

torchaudio.compliance.kaldi

This matches Kaldi's OfflineFeatureTpl ResampleWaveform which uses a LinearResample (resample a signal at linearly spaced intervals to upsample/downsample a ...