question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Saving and loading the downsampled audio results in a tensor with zeros.

See original GitHub issue

🐛 Bug

I was using this particular audio file that I downloaded from https://www.lynxstudio.com/ to experiment with 6 channel audio. When I downsample this from 44.1kHz 8kHz, everything seems fine and I am able to play the audio. However, if I were to save the downsampled file and load it again, the tensor I get back have all zeros in it.

import torchaudio

waveform, sample_rate = torchaudio.load('ChannelPlacement.wav')

downsample_rate=8000

downsample_resample = torchaudio.transforms.Resample(
    sample_rate, downsample_rate, resampling_method='sinc_interpolation')

down_sampled = downsample_resample(waveform)

print(down_sampled)

torchaudio.save('temp.wav', down_sampled, downsample_rate)

waveform2, sample_rate2 = torchaudio.load('temp.wav')

print(waveform2)
tensor([[-3.7585e-09, -3.3725e-09,  9.2130e-09,  ..., -4.0691e-08,
          7.2912e-09, -5.7485e-08],
        [-3.2915e-09,  7.5441e-09,  9.4772e-10,  ..., -7.3543e-09,
          3.1981e-08, -2.3025e-08],
        [ 1.5473e-08,  1.8003e-08,  2.5778e-09,  ..., -1.0129e-08,
         -2.0479e-08,  2.6770e-08],
        [-2.1108e-08, -3.9693e-08, -2.2911e-08,  ..., -2.4338e-08,
          3.7029e-08, -3.1360e-09],
        [-2.1277e-08, -1.9114e-09, -4.4245e-09,  ..., -2.3023e-08,
          6.9994e-09, -7.5472e-09],
        [ 5.2013e-09,  2.5186e-08,  2.1362e-08,  ..., -2.6036e-07,
         -1.5355e-07,  3.7919e-08]])
tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]])

To Reproduce

Steps to reproduce the behavior:

  1. Load ChannelPlacement.wav which is included in the attached zip file
  2. Downsample to 8000 (This 8000 seems to be a magic number as if I were to change it to another number, like 7999, it works fine)
  3. Save the downsampled version
  4. Load the downsampled file we just saved
  5. The tensor torchaudio.load returns have all zeros

Here is the gist https://gist.github.com/hiromis/3a0ce0e3b8a512465609c653364c02fe

The following zip file includes the jupyter notebook as well as the audio file: files.zip

Expected behavior

torchaudio.save('temp.wav', down_sampled, downsample_rate)
waveform, sample_rate = torchaudio.load('temp.wav')

Where waveform tensor to match the tensor I saved (down_sampled)

Environment

  • What commands did you used to install torchaudio (conda/pip/build from source)? pip
  • If you are building from source, which commit is it? N/A
  • What does torchaudio.__version__ print? (If applicable) ‘0.3.0’
PyTorch version: 1.2.0
Is debug build: No
CUDA used to build PyTorch: 10.0.130

OS: Ubuntu 18.04.3 LTS
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: version 3.10.2

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 9.1.85
GPU models and configuration: GPU 0: GeForce GTX 1080 Ti
Nvidia driver version: 430.26
cuDNN version: /usr/local/cuda-10.0/lib64/libcudnn.so.7.5.0

Versions of relevant libraries:
[pip3] numpy==1.16.4
[pip3] torch==1.1.0
[conda] torch                     1.2.0                    pypi_0    pypi
[conda] torchaudio                0.3.0                    pypi_0    pypi
[conda] torchvision               0.4.0                    pypi_0    pypi

Additional context

I did try several other files, and it seems as though this file and sample rate of 8000 seem to hit the edge case of some sort.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:2
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

3reactions
jamarshoncommented, Aug 22, 2019

I don’t think this is related to resampling. I think it’s related to the save/load which has a wide variety of configurations (precision, normalization, and scaling). The signal starts off as a 24 bits per sample and after loaded it is a float32 (32 bits per sample) which is resampled and then saved to file.

down_sampled.abs().max() == 1.0006 which is greater than 1.0 so it is not scaled and converted to a int64/long https://github.com/pytorch/audio/blob/a424509dda5b57c932fa8b5b780de93e60ed7ee2/torchaudio/__init__.py#L191 This float32 tensor is then stored in a sox_sample_t/sox_int32_t buffer before being written to a file https://github.com/pytorch/audio/blob/a424509dda5b57c932fa8b5b780de93e60ed7ee2/torchaudio/torch_sox.cpp#L37. The copy of float32 to sox_int32_t will then convert all the values to zero as it is truncated.

Tangentially related concern in the past: https://github.com/pytorch/audio/pull/119#discussion_r293929024

Some resources for references, https://en.wikipedia.org/wiki/Single-precision_floating-point_format http://soundfile.sapp.org/doc/WaveFormat/ https://github.com/kaldi-asr/kaldi/blob/master/src/feat/wave-reader.cc

sox uses various sox_int32_t buffers for reading/writing where it does some conversion to various bits per sample. https://github.com/rbouqueau/SoX/blob/e29e9ceb7c25a2d83c09bc8a601de117fc65563c/src/wavpack.c#L129

I think the solution to the issue would be rewrite/fix torchaudio load and save so that all the bits of the waveform are saved in the file (e.g. somehow convert the tensor float32 to int32 bitwise for sox buffer or not use sox at all). The current torchaudio implementation of load/save seems to lose some bits and could be improved/more clear (e.g input/outputs dtype, scale of the input/output).

down_sampled2 = (down_sampled << 32).long()
torchaudio.save('temp.wav', down_sampled2, downsample_rate, precision=32)

waveform2, sample_rate2 = torchaudio.load('temp.wav', normalization=None)

print('down_sampled2\n', down_sampled2)
print('waveform2\n', waveform2)

# note not the same as information is lost in saving
# down_sampled2
# tensor([[  -16,   -14,    39,  ...,  -174,    31,  -246],
#     [  -14,    32,     4,  ...,   -31,   137,   -98],
#     [   66,    77,    11,  ...,   -43,   -87,   114],
#     [  -90,  -170,   -98,  ...,  -104,   159,   -13],
#     [  -91,    -8,   -19,  ...,   -98,    30,   -32],
#     [   22,   108,    91,  ..., -1118,  -659,   162]])
# waveform2
# tensor([[  -16.,   -14.,    39.,  ...,  -174.,    31.,  -246.],
#     [  -14.,    32.,     4.,  ...,   -31.,   137.,   -98.],
#     [   66.,    77.,    11.,  ...,   -43.,   -87.,   114.],
#     [  -90.,  -170.,   -98.,  ...,  -104.,   159.,   -13.],
#     [  -91.,    -8.,   -19.,  ...,   -98.,    30.,   -32.],
#     [   22.,   108.,    91.,  ..., -1118.,  -659.,   162.]])
2reactions
vincentqbcommented, Aug 28, 2019

FYI: We are currently discussing standardizing the data loading in pytorch/pytorch#24915, and make any post-processing more transparent.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Audio manipulation with torchaudio - PyTorch
To save audio data in the formats intepretable by common applications, you can use torchaudio.save . This function accepts path-like object and file-like...
Read more >
Simple audio recognition: Recognizing keywords - TensorFlow
This tutorial demonstrates how to preprocess audio files in the WAV format and build and train a basic automatic speech recognition (ASR) model...
Read more >
AVAudioEngine downsample issue - swift - Stack Overflow
If I use this code it gives me all zeros in the buffer. Do you know what I'm doing wrong? I'm using iPhone...
Read more >
Wav2Vec2 - Hugging Face
We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the ......
Read more >
Simple audio recognition: Recognizing keywords - Kaggle
You'll be using a portion of the dataset to save time with data loading. ... build your training set to extract the audio-label...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found