Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Isolated source audio files occasionally are longer in duration than soundscape duration

See original GitHub issue

Occasionally, when generating mixtures using only foreground events (no background) and saving the isolated events to disk, one of audio files for the isolated sources is slightly longer than the target soundscape duration. In this specific case, the target soundscape was to be 4 seconds at 16kHz (i.e. 64,000 samples). The anomalous isolated source file had 64,768 samples. In the original context I found this issue, I generated 20,000 soundscapes and 89 of them exhibited this behavior. In each case, only exactly one isolated source exhibited this issue. These audio files always had exactly 64,768 samples.

I’ve included a reproducible example in an attached zip.

scaper-debug.zip

Within the (unzipped) directory, the following can be run to replicate the issue:

import scaper
scaper.generate_from_jams('./soundscape.jams', './soundscape.wav', save_isolated_events=True)

./soundscape_events/foreground12_jackhammer.wav should be the anomalous file here.

I’m using Python 3.6.7 (on Ubuntu 18.04.4 LTS), and using the scaper version at commit d0431ec7b091d49709dfce25d149f6dfd0e982c8.

Issue Analytics

State:
Created 4 years ago
Comments:13 (4 by maintainers)

Top GitHub Comments

1reaction

pseethcommented, Mar 5, 2020

If you look at the current test for match_sample_length, you can see the formats and subtypes that we actually test on (it’s not all of them):

https://github.com/justinsalamon/scaper/blob/d0431ec7b091d49709dfce25d149f6dfd0e982c8/tests/test_audio.py#L55-L71

So while WAV is in there, MS_ADPCM is not one of the subtypes we test the function on. So I guess make sure your source audio going into Scaper is in those tested lists for now.

1reaction

pseethcommented, Mar 5, 2020

The bug seems to literally be here:

https://github.com/justinsalamon/scaper/blob/d0431ec7b091d49709dfce25d149f6dfd0e982c8/scaper/audio.py#L150

The shape of that audio data array is 64000 when writing to the audio file. But then when we write it to disk using soundfile it suddenly becomes 64768. Bizarre!

So I literally just tried running this on that specific audio file (I moved it first two directories up and then ffmpeg it back down):

ffmpeg -i ../../88466-7-0-0.wav 88466-7-0-0.wav

I tried just fixing the source audio file prior to feeding it into Scaper using the statement above… The output for my print statements now looks like this:

before ./soundscape_events/foreground12_jackhammer.wav
samplerate: 16000 Hz
channels: 1
duration: 4.000 s
format: WAV (Microsoft) [WAV]
subtype: Signed 16 bit PCM [PCM_16] ./soundscape_events/foreground12_jackhammer.wav (64000,)
after ./soundscape_events/foreground12_jackhammer.wav
samplerate: 16000 Hz
channels: 1
duration: 4.000 s
format: WAV (Microsoft) [WAV]
subtype: Signed 16 bit PCM [PCM_16] ./soundscape_events/foreground12_jackhammer.wav (64000,)
./soundscape_events/foreground12_jackhammer.wav: adjusted from 64000 to 64000

and the script I pasted above passes for the exact same JAMS file (but with a fixed 88466-7-0-0.wav file).

Also the file size for the WAV file changed from 37kb to 147kb on my machine. I think how Scaper, Sox, and Soundfile interact with all the different types of sound files needs some more investigation.

So my recommendation right now: normalize all of your data via ffmpeg first so that every single audio file you’re trying to mix has the same sample rate, is a .wav file, and has the subtype Signed 16 bit PCM. We should perhaps find subtypes that don’t work and throw a warning.

Top Results From Across the Web

Scaper tutorial — scaper 1.6.5 documentation

Scaper creates new soundscapes by combining and transforming a set of existing audio files, which we'll refer to as the source material ....

arXiv:2103.12306v2 [cs.SD] 7 Oct 2021

event boundaries, unlike FSD50K, which has audio clips of variable duration. Most of the isolated sound event-based synthetic datasets.

Sensing ecosystem dynamics via audio source separation

To implement soundscapes in marine ecosystem assessments, the audio information ... We addressed the challenges in analyzing long-duration underwater ...

Soundscape Composition at SFU

The studio techniques used at that time consisted mainly of transparent editing and ... format was used, although with many more sound examples...

Features Neuro Presets - Source Audio Website

Since that time, Brett has expanded his Source Audio sound palette to ... It has been a little too long since our last...