question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Merging plan from torchaudio-contrib

See original GitHub issue

Hi all, I think it’s good timing to discuss a potential merging plan from torchaudio-contrib to here, especially because there’s going to be new features and changes by @jamarshon @cpuhrsch.

Main idea

A lot of things are well summarized in https://github.com/keunwoochoi/torchaudio-contrib. In short, we wanted to re-design torch-based audio processing so that

  • things can be Layers, which are based on corresponding Functionals
  • names for layers and arguments are carefully chosen
  • all work for multi-channel
  • complex numbers are supported when it makes sense (e.g., STFTs)

Review - layers

. torchaudio-contrib already covers lots of functions that transform.py is covering now, but not all of them. And that’s why I feel like it’s time to discuss this here. Let me list the classes in transform.py one by one with some notes.

1. Already in torchaudio-contrib. Hoping we’d replace.

  • class Spectrogram: we have it in torchaudio-contrib. On top of this, we also have STFT layer which outputs complex representations (same as torch.stft since we’re wrapping it).
  • class MelScale: we have it and would like to suggest to change the name to something more general. We named it class MelFilterbank, assuming there can be other types of filterbanks, too. It also supports htk and non-htk mel filterbanks.
  • class SpectrogramToDB: we would like to propose a more general approach – class AmplitudeToDb(ref=1.0, amin=1e-7) and class DbToAmplitude(ref=1.0), because decibel-scaling is about changing it’s unit, not the core content of the input.
  • class MelSpectrogram: we have it, which returns a nn.Sequential model consists of Spectrogram and mel-scale filter bank.
  • class MuLawEncoding, class MuLawExpanding: we have it, actually a 99% copy of the implementation here.

2. Wouldn’t need these

  • class Compose: we wouldn’t need it because once things are based on Layers people can simply build a nn.Sequential().
  • class Scale: It does 16int --> float. I think we need to deprecate this because if we really need this, it should be with a more intuitive and precise name, and probably should support other conversions as well.

3. To-be-added

  • class DownmixMono: I would like to have one. But we also consider having a time-frequency representation-based downmix (energy-preserving operation) (@faroit). I’m open for discussion. Personally I’d prefer to have separate classes,DownmixWaveform() and DownmixSpecgram(). Maybe until we have a better one, we should keep it as it is.
  • class MFCC: we currently don’t have it. The current torch/audio implementation uses s2db (SpectrogramToDB), but this class seems little arbitrary for me, so we might want to re-implement it.

4. Not sure about these

  • class PadTrim: I don’t actually know why we need it exactly, would love to hear about this!
  • class LC2CL: So far, torchaudio-contrib code hasn’t considered channel-first tensors. If it’s a thing, we’d i) update our code to make them compatible and ii) have the same or a similar class to this. But, …do we really need this?
  • class BLC2CBL: same as LC2CL – I’d like to know its use cases.

Review - argument and variable names

As summarised --> https://github.com/keunwoochoi/torchaudio-contrib/issues/46, we’d like to use

  • waveforms for a batch of waveforms
  • real_specgrams for magnitude spectrograms
  • complex_specgrams for complex spectrograms . (This is relatively less-discussed).

Audio loading

@faroit has been working on replacing Sox with others. But here in this issue, I’d like to focus on the topics above.

So,

  • Any opinion on this?
  • Any answers to the questions I have!
  • If it looks good, what else would you like to have in the one-shot PR that would replace the current transforms.py?

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:2
  • Comments:22 (16 by maintainers)

github_iconTop GitHub Comments

3reactions
keunwoochoicommented, Aug 26, 2019

I’m not sure what’s the current plan. They are definitely useful, but I don’t think it’s best hosted in the part of torchaudio. The installation (= dependency) issue will be always there and add maintenance cost and potential risks. I also hardly imagine their operations can benefit from GPUs. Due to these issues, I would avoid using sox as a part of a system if it requires some reliability and efficiency. For a quick and hacky use-case, one can easily plug it in their - maybe - preprocessing stage.

Side note - maybe some of the filters (e.g., EQ) could be re-implemented in torch 😃

2reactions
faroitcommented, Aug 27, 2019

@vincentqb We had a long discussion about this in https://github.com/keunwoochoi/torchaudio-contrib/issues/31. I think many people agree that fast audio loading is really important and for many users of torchaudio the load functionality is probably the only place where they touch the sox lib. Therefore it would make a lot of since if only load and save would be replaced by native torch code (probably interfacing libsndfile or just reading the wav bits from scripts like its being done in tensorflow.io).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Developers - Merging plan from torchaudio-contrib -
Hi all, I think it's good timing to discuss a potential merging plan from torchaudio-contrib to here, especially because there's going to be...
Read more >
PyTorch Contribution Guide
Once a pull request is accepted and CI is passing, there is nothing else you need to do; we will merge the PR...
Read more >
torchaudio Changelog - pyup.io
For the detail of this migration plan, please refer to 1337. - Dropped pseudo complex support ... Updated script for getting PR merger...
Read more >
Considering transforms in torchaudio, is window length etc. ...
The n_fft variable is used in the Spectrogram class. And it's forward function documentation mentions. where n_fft is the number of Fourier ...
Read more >
Package List — Spack 0.20.0.dev0 documentation
The update process obtains LDIF from a number of sources and merges them. ... These contributions include many performance data collectors and support ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found