Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Merging plan from torchaudio-contrib

See original GitHub issue

Hi all, I think it’s good timing to discuss a potential merging plan from torchaudio-contrib to here, especially because there’s going to be new features and changes by @jamarshon @cpuhrsch.

Main idea

A lot of things are well summarized in https://github.com/keunwoochoi/torchaudio-contrib. In short, we wanted to re-design torch-based audio processing so that

things can be Layers, which are based on corresponding Functionals
names for layers and arguments are carefully chosen
all work for multi-channel
complex numbers are supported when it makes sense (e.g., STFTs)

Review - layers

. torchaudio-contrib already covers lots of functions that transform.py is covering now, but not all of them. And that’s why I feel like it’s time to discuss this here. Let me list the classes in transform.py one by one with some notes.

1. Already in torchaudio-contrib. Hoping we’d replace.

class Spectrogram: we have it in torchaudio-contrib. On top of this, we also have STFT layer which outputs complex representations (same as torch.stft since we’re wrapping it).
class MelScale: we have it and would like to suggest to change the name to something more general. We named it class MelFilterbank, assuming there can be other types of filterbanks, too. It also supports htk and non-htk mel filterbanks.
class SpectrogramToDB: we would like to propose a more general approach – class AmplitudeToDb(ref=1.0, amin=1e-7) and class DbToAmplitude(ref=1.0), because decibel-scaling is about changing it’s unit, not the core content of the input.
class MelSpectrogram: we have it, which returns a nn.Sequential model consists of Spectrogram and mel-scale filter bank.
class MuLawEncoding, class MuLawExpanding: we have it, actually a 99% copy of the implementation here.

2. Wouldn’t need these

class Compose: we wouldn’t need it because once things are based on Layers people can simply build a nn.Sequential().
class Scale: It does 16int --> float. I think we need to deprecate this because if we really need this, it should be with a more intuitive and precise name, and probably should support other conversions as well.

3. To-be-added

class DownmixMono: I would like to have one. But we also consider having a time-frequency representation-based downmix (energy-preserving operation) (@faroit). I’m open for discussion. Personally I’d prefer to have separate classes,DownmixWaveform() and DownmixSpecgram(). Maybe until we have a better one, we should keep it as it is.
class MFCC: we currently don’t have it. The current torch/audio implementation uses s2db (SpectrogramToDB), but this class seems little arbitrary for me, so we might want to re-implement it.

4. Not sure about these

class PadTrim: I don’t actually know why we need it exactly, would love to hear about this!
class LC2CL: So far, torchaudio-contrib code hasn’t considered channel-first tensors. If it’s a thing, we’d i) update our code to make them compatible and ii) have the same or a similar class to this. But, …do we really need this?
class BLC2CBL: same as LC2CL – I’d like to know its use cases.

Review - argument and variable names

As summarised --> https://github.com/keunwoochoi/torchaudio-contrib/issues/46, we’d like to use

waveforms for a batch of waveforms
real_specgrams for magnitude spectrograms
complex_specgrams for complex spectrograms . (This is relatively less-discussed).

Audio loading

@faroit has been working on replacing Sox with others. But here in this issue, I’d like to focus on the topics above.

So,

Any opinion on this?
Any answers to the questions I have!
If it looks good, what else would you like to have in the one-shot PR that would replace the current transforms.py?

Issue Analytics

State:
Created 4 years ago
Reactions:2
Comments:22 (16 by maintainers)

Top GitHub Comments

3reactions

keunwoochoicommented, Aug 26, 2019

I’m not sure what’s the current plan. They are definitely useful, but I don’t think it’s best hosted in the part of torchaudio. The installation (= dependency) issue will be always there and add maintenance cost and potential risks. I also hardly imagine their operations can benefit from GPUs. Due to these issues, I would avoid using sox as a part of a system if it requires some reliability and efficiency. For a quick and hacky use-case, one can easily plug it in their - maybe - preprocessing stage.

Side note - maybe some of the filters (e.g., EQ) could be re-implemented in torch 😃

2reactions

faroitcommented, Aug 27, 2019

@vincentqb We had a long discussion about this in https://github.com/keunwoochoi/torchaudio-contrib/issues/31. I think many people agree that fast audio loading is really important and for many users of torchaudio the load functionality is probably the only place where they touch the sox lib. Therefore it would make a lot of since if only load and save would be replaced by native torch code (probably interfacing libsndfile or just reading the wav bits from scripts like its being done in tensorflow.io).

Top Results From Across the Web

Developers - Merging plan from torchaudio-contrib -

Hi all, I think it's good timing to discuss a potential merging plan from torchaudio-contrib to here, especially because there's going to be...

PyTorch Contribution Guide

Once a pull request is accepted and CI is passing, there is nothing else you need to do; we will merge the PR...

torchaudio Changelog - pyup.io

For the detail of this migration plan, please refer to 1337. - Dropped pseudo complex support ... Updated script for getting PR merger...

Considering transforms in torchaudio, is window length etc. ...

The n_fft variable is used in the Spectrogram class. And it's forward function documentation mentions. where n_fft is the number of Fourier ...

Package List — Spack 0.20.0.dev0 documentation

The update process obtains LDIF from a number of sources and merges them. ... These contributions include many performance data collectors and support ......