Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

torchaudio.compliance.kaldi.spectrogram is very different from torchaudio.transforms.spectrogram

See original GitHub issue

Does torchaudio.compliance.kaldi.spectrogram only currently support vectors?

When feeding a tensor of shape torch.Size([2, 276858]) the result is not what’s expected, yet there is no error. I would expect a “train pattern” to be visible, as in the second figure below.

This is what kaldi gives download (1)

This is what torchaudio.transforms.spectrogram gives download (2)

The “train pattern” is also visible on academo.org.

Issue Analytics

State:
Created 4 years ago
Comments:9 (8 by maintainers)

Top GitHub Comments

1reaction

vincentqbcommented, Dec 17, 2019

@vincentqb Can you upload the wav file again. I can not find it. The link is broken.

The file can still be accessed here.

1reaction

jamarshoncommented, Jul 22, 2019

@vincentqb I could investigate the flags more kaldi.spectrogram to get a more closer result but is this more similar to what you would expect?

import torch
import torchaudio
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt

filename = "/Users/jamarshon/Documents/GitHub/audio/test/assets/steam-train-whistle-daniel_simon.mp3"
s, sr = torchaudio.load(filename)
EPSILON = torch.tensor(torch.finfo(torch.float).eps, dtype=torch.get_default_dtype())

spec = torchaudio.transforms.Spectrogram()(s)
x = torch.max(EPSILON, spec).log2().transpose(1,2)[0,:,:]
plt.imshow(x.numpy(), cmap='gray')
plt.show()

n_fft = 400.0
fl = n_fft / sr * 1000.0
fs = fl / 2.0
spec2 = torchaudio.compliance.kaldi.spectrogram(
	s, dither=0.0, window_type='hanning', 
	frame_length=fl, frame_shift=fs, remove_dc_offset=False, 
	round_to_power_of_two=False, sample_frequency=sr)
y = spec2.t()
plt.imshow(y.numpy(), cmap='gray')
plt.show()

Spec1: spec1 Spec2: spec2

Top Results From Across the Web

torchaudio.compliance.kaldi - PyTorch

Various functions with identical parameters are given so that torchaudio can produce similar outputs. spectrogram. Create a spectrogram from a raw audio signal....

torchaudio.compliance.kaldi

Create a spectrogram from a raw audio signal. This matches the input/output of Kaldi's compute-spectrogram-feats. Parameters: waveform (Tensor) – ...

How to use the torchaudio.compliance.kaldi ... - Snyk

To help you get started, we've selected a few torchaudio.compliance.kaldi.resample_waveform examples, based on popular ways it is used in public projects.

torchaudio.transforms — Torchaudio 0.7.0 documentation

power (float or None, optional) – Exponent for the magnitude spectrogram, ... and so may return different values for an audio clip split...

torchaudio Changelog - pyup.io

To use the transform in devices other than CPU, please move the ... [torchaudio.compliance.kaldi.spectrogram](https://pytorch.org/audio/compliance.kaldi.