question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Which library is torchaudio consistent with?

See original GitHub issue

Hi, I’m currently updating my torch codebase from using librosa to torchaudio for transforms, to take advantage of the (much) faster stft torch implementation on the GPU. However, running into several occasions where the output from Spectrogram vs. librosa.core._spectrogram, MelSpectrogram vs. librosa.melspectrogram have different results. Does this repo ensure consistency with another python audio library for those transformations? I think it would be good to have consistency with another widely used library. Currently figuring out the correct params to ensure consistency and I can PR something if that sounds useful.

For example:

sound, sample_rate = torchaudio.load('wav_file.wav')
sound = sound
sound_librosa = sound.cpu().numpy().squeeze().T

sample_rate = 16000
n_mels = 40
window_stride = 0.01
window_size = 0.025
hop_length = int(sample_rate * window_stride)
n_fft = int(sample_rate * window_size)

stft_librosa = librosa.stft(y=sound_librosa,
                            hop_length=hop_length,
                            n_fft=n_fft)
spectro_librosa, n_fft = librosa.core.spectrum._spectrogram(y=sound_librosa,
                            hop_length=hop_length,
                            n_fft=n_fft, power=2)
mel_basis = librosa.filters.mel(sample_rate,
                                n_mels=n_mels,
                                n_fft=n_fft,
                                norm=None, # non-standard
                                htk=True) # non-standard
check = np.dot(mel_basis, spectro_librosa)

stft_torch = torch.stft(soundcuda,
                        hop_length=hop_length,
                        n_fft=n_fft,
                        window=window).transpose(1, 2)
spectro_torch = stft_torch.pow(2).sum(-1)
melscale = torchaudio.transforms.MelScale(n_mels=n_mels)
check2 = melscale(check)

#check == check2

The torchaudio MelScale uses the non-default librosa options norm=None, htk=True on librosa.filters.mel (https://librosa.github.io/librosa/_modules/librosa/filters.html#mel). I also removed the default spectrogram normalization at https://github.com/pytorch/audio/blob/master/torchaudio/transforms.py#L198, which is not a librosa option.

There’s also functional inconsistencies between the librosa and torchaudio function calls – librosa returns a spectrogram with librosa.feature.melspectrogram, whereas torchaudio converts the spectrogram to the Db scale.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:2
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
PCerlescommented, Feb 10, 2019

Thanks! The torch.stft implementation is consistent with librosa defaults if you use the same window type, I tested that first.

window = torch.hann_window(n_fft)
torch.stft(sound, hop_length = hop_length, n_fft=n_fft, window=window)
==
librosa.stft(sound, hop_length, n_fft)

I’m interested in the transforms.py function signatures being consistent and any transforms being visible/configurable (e.g. the spectrogram normalization), but didn’t know if these functions were modeled off some library. Personally, I think it makes sense to mirror librosa functionality for functions like MelSpectrogram, since librosa seems to be the most popular python audio library. What do you think? I’m sure the current torchaudio default implementations also make sense for some applications. Happy to contribute if we want to make torchaudio consistent with librosa output for those common speech-to-text transforms

0reactions
vincentqbcommented, Aug 26, 2019

Closing this issue, since PR got merged. Please feel free to re-open 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

Torchaudio 0.13.1 documentation - PyTorch
Torchaudio is a library for audio and signal processing with PyTorch. It provides I/O, signal and data processing functions, datasets, model implementations ...
Read more >
torchaudio - PyPI
An audio package for PyTorch.
Read more >
Torchaudio: An Audio Library for PyTorch - Morioh
Torchaudio: an audio library for PyTorch. Data manipulation and transformation for audio signal processing, powered by PyTorch. A machine learning library ...
Read more >
torchaudio.datasets.ljspeech — Torchaudio 0.7.0 documentation
Tools & Libraries ... Source code for torchaudio.datasets.ljspeech. import os import csv from typing import List, Tuple import torchaudio from ...
Read more >
torchaudio.compliance.kaldi
This matches Kaldi's OfflineFeatureTpl ResampleWaveform which uses a LinearResample (resample a signal at linearly spaced intervals to upsample/downsample a ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found