question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

torchaudio.compliance.kaldi.spectrogram is very different from torchaudio.transforms.spectrogram

See original GitHub issue

Does torchaudio.compliance.kaldi.spectrogram only currently support vectors?

When feeding a tensor of shape torch.Size([2, 276858]) the result is not what’s expected, yet there is no error. I would expect a “train pattern” to be visible, as in the second figure below.

This is what kaldi gives download (1)

This is what torchaudio.transforms.spectrogram gives download (2)

The “train pattern” is also visible on academo.org.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:9 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
vincentqbcommented, Dec 17, 2019

@vincentqb Can you upload the wav file again. I can not find it. The link is broken.

The file can still be accessed here.

1reaction
jamarshoncommented, Jul 22, 2019

@vincentqb I could investigate the flags more kaldi.spectrogram to get a more closer result but is this more similar to what you would expect?

import torch
import torchaudio
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt

filename = "/Users/jamarshon/Documents/GitHub/audio/test/assets/steam-train-whistle-daniel_simon.mp3"
s, sr = torchaudio.load(filename)
EPSILON = torch.tensor(torch.finfo(torch.float).eps, dtype=torch.get_default_dtype())

spec = torchaudio.transforms.Spectrogram()(s)
x = torch.max(EPSILON, spec).log2().transpose(1,2)[0,:,:]
plt.imshow(x.numpy(), cmap='gray')
plt.show()

n_fft = 400.0
fl = n_fft / sr * 1000.0
fs = fl / 2.0
spec2 = torchaudio.compliance.kaldi.spectrogram(
	s, dither=0.0, window_type='hanning', 
	frame_length=fl, frame_shift=fs, remove_dc_offset=False, 
	round_to_power_of_two=False, sample_frequency=sr)
y = spec2.t()
plt.imshow(y.numpy(), cmap='gray')
plt.show()

Spec1: spec1 Spec2: spec2

Read more comments on GitHub >

github_iconTop Results From Across the Web

torchaudio.compliance.kaldi - PyTorch
Various functions with identical parameters are given so that torchaudio can produce similar outputs. spectrogram. Create a spectrogram from a raw audio signal....
Read more >
torchaudio.compliance.kaldi
Create a spectrogram from a raw audio signal. This matches the input/output of Kaldi's compute-spectrogram-feats. Parameters: waveform (Tensor) – ...
Read more >
How to use the torchaudio.compliance.kaldi ... - Snyk
To help you get started, we've selected a few torchaudio.compliance.kaldi.resample_waveform examples, based on popular ways it is used in public projects.
Read more >
torchaudio.transforms — Torchaudio 0.7.0 documentation
power (float or None, optional) – Exponent for the magnitude spectrogram, ... and so may return different values for an audio clip split...
Read more >
torchaudio Changelog - pyup.io
To use the transform in devices other than CPU, please move the ... [torchaudio.compliance.kaldi.spectrogram](https://pytorch.org/audio/compliance.kaldi.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found