Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

transforms.AmplitudeToDB does not handle cut-off correctly for multi-channel or batched data

See original GitHub issue

🐛 Bug

From my understanding (based on e.g. #337), all transforms should be able to operate on tensors with dimensions (batch, channels, ...) or (channels, ...), with ... being dependent on the type of data being processed, e.g.time for waveform data and freq, time for spectrograms. In this way, we can pass multiple chunks of data (waveforms, spectrograms…) at once and expect to get the same results as if we would pass them one by one.

However, this is not the case transforms.AmplitudeToDB: As easily traceable in the source code of the corresponding functional, this transform blindly operates on the passed tensor without taking its dimensionality and the related semantics into account in any way.

This becomes a problem in the calculation of the cut-off. The purpose of this step is to clamp low dB values to a minimum value some fixed amount of decibels below the maximum value in the respective spectrogram. However, amplitude_to_db uses the single global maximum in the passed tensor to calculate the cut-off for all contained spectrograms. Thus, when passing batched data, the results for one spectrogram are dependent on all the other spectrograms in the same batch, which to my understanding is not the correct behavior. My conclusion is that AmplitudeToDB silently outputs wrong (in the sense of transforms general interface contract) data for batched or multi-channel data, which I would consider really dangerous for applications.

Ideally, this should be fixed directly in functional.amplitude_to_DB, so that we can also pass batched data there.

Environment

TorchAudio 0.7.0

Issue Analytics

State:
Created 3 years ago
Comments:12 (10 by maintainers)

Top GitHub Comments

1reaction

mthrokcommented, Nov 30, 2020

HI @jcaw

You can go ahead and open a PR. That way it is easier to keep the discussion going. I would like to update our test infrastructure to catch this kind of bugs, so it will help to think of a way to fix the tests.

1reaction

vincentqbcommented, Nov 16, 2020

I’ve got a local patch for this, didn’t realize there was already a ticket open. Would you like me to open a PR?

That would great, thanks! 😃

My understanding is that the objective here is to retain the clamping behaviour

Yes, we should retain the clamping behavior.

but apply a separate clamp per-spectrogram. Is this correct?

That’s an interesting point: per channel or spectrogram? Reading at the code, I’d say the intended behavior is likely to have been per spectrogram, since the batching came later. So let’s keep this behavior.

What input shape should amplitude_to_DB expect? Is it always a spectrogram with N channels, requiring batches to be packed? Should it be capable of receiving an unpacked batch?

Batches and channels were packed together to add batching support, so this does need to be fixed. In particular, this means the transform should likely not fold batches into channels to apply the transform.

Are there always two trailing dimensions, freq & time, or might the data be in a different format?

The format should be (..., freq, time). The function does not apply to complex tensor (which would have had a shape (…, freq, time, 2).

Should the clamping be applied per-spectrogram, or per-channel? If it’s the former, that makes things more complicated as the different MFCCs are packed - so there’s a single combined dimension for the batch & channel.

Yes, indeed, though comment above would help take care of replacing clamp by min/max.

Top Results From Across the Web

torchaudio.transforms - PyTorch

Transforms are implemented using torch.nn.Module . Common ways to build a processing pipeline are to define custom Module class or chain Modules together ......

Use Batch Transform - Amazon SageMaker

Use a batch transform job to get inferences for an entire dataset, when you don't need a persistent endpoint, or to preprocess your...

audio transformer pytorch

How to transform audio to spectogram using PyTorch? Basic understanding of CNNs. performance improvements. and the amount of data is large, ...

Identifying and mitigating batch effects in whole genome ...

These batch effects are not well understood and can be due to changes in the ... protocol or bioinformatics tools used to process...

Audio I/O and Pre-Processing with torchaudio - Curso-R

In this tutorial, we will see how to load and preprocess data from a ... AmplitudeToDB: This turns a spectrogram from the power/amplitude...