Isolated source audio files occasionally are longer in duration than soundscape duration
See original GitHub issueOccasionally, when generating mixtures using only foreground events (no background) and saving the isolated events to disk, one of audio files for the isolated sources is slightly longer than the target soundscape duration. In this specific case, the target soundscape was to be 4 seconds at 16kHz (i.e. 64,000 samples). The anomalous isolated source file had 64,768 samples. In the original context I found this issue, I generated 20,000 soundscapes and 89 of them exhibited this behavior. In each case, only exactly one isolated source exhibited this issue. These audio files always had exactly 64,768 samples.
I’ve included a reproducible example in an attached zip.
Within the (unzipped) directory, the following can be run to replicate the issue:
import scaper
scaper.generate_from_jams('./soundscape.jams', './soundscape.wav', save_isolated_events=True)
./soundscape_events/foreground12_jackhammer.wav
should be the anomalous file here.
I’m using Python 3.6.7 (on Ubuntu 18.04.4 LTS), and using the scaper version at commit d0431ec7b091d49709dfce25d149f6dfd0e982c8.
Issue Analytics
- State:
- Created 4 years ago
- Comments:13 (4 by maintainers)
Top GitHub Comments
If you look at the current test for
match_sample_length
, you can see the formats and subtypes that we actually test on (it’s not all of them):https://github.com/justinsalamon/scaper/blob/d0431ec7b091d49709dfce25d149f6dfd0e982c8/tests/test_audio.py#L55-L71
So while
WAV
is in there,MS_ADPCM
is not one of the subtypes we test the function on. So I guess make sure your source audio going into Scaper is in those tested lists for now.The bug seems to literally be here:
https://github.com/justinsalamon/scaper/blob/d0431ec7b091d49709dfce25d149f6dfd0e982c8/scaper/audio.py#L150
The shape of that audio data array is 64000 when writing to the audio file. But then when we write it to disk using
soundfile
it suddenly becomes 64768. Bizarre!So I literally just tried running this on that specific audio file (I moved it first two directories up and then ffmpeg it back down):
I tried just fixing the source audio file prior to feeding it into Scaper using the statement above… The output for my print statements now looks like this:
and the script I pasted above passes for the exact same JAMS file (but with a fixed 88466-7-0-0.wav file).
Also the file size for the WAV file changed from 37kb to 147kb on my machine. I think how Scaper, Sox, and Soundfile interact with all the different types of sound files needs some more investigation.
So my recommendation right now: normalize all of your data via ffmpeg first so that every single audio file you’re trying to mix has the same sample rate, is a
.wav
file, and has the subtypeSigned 16 bit PCM
. We should perhaps find subtypes that don’t work and throw a warning.