Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Data Augmentation

See original GitHub issue

Hi, I am following this paper for performing data augmentation on the musdb. I am using librosa time_stretch and pitch_shift on each sample of the musdb dataset. I then use spempeg to build a new stem file. Unfortunately, the preprocessing of the wave-u-net shows these statistics that seem not to so be good for re-training the network properly:

stems_augmented/train/ANiMAL - Clinic A_stretched_1.0.stem_bass.wav
stems_augmented/train/ANiMAL - Clinic A_stretched_1.0.stem_drums.wav
stems_augmented/train/ANiMAL - Clinic A_stretched_1.0.stem_other.wav
stems_augmented/train/ANiMAL - Clinic A_stretched_1.0.stem_vocals.wav
Maximum absolute deviation from source additivity constraint: 1.015533447265625
Mean absolute deviation from source additivity constraint:    0.09679516069867423

On the musdb website it is also stated that:

Since the mixture is separately encoded as AAC, there there is a small difference between the sum of all sources and the mixture. This difference has no impact on the bsseval evaluation performance.

Some of my code:

SR = 44100
R = 0.1

def timeStretch(y, rate=0):

    y_right = y[:, 0]
    y_left = y[:, 1]

    y_stretched_R = librosa.effects.time_stretch(y_right, rate=rate)
    y_stretched_L = librosa.effects.time_stretch(y_left, rate=rate)

    y_stretched = np.array([y_stretched_R, y_stretched_L])

    return y_stretched


# open stem and retrieve all channels
stem_path = os.path.join(ORIGINAL_STEMS_DIR, f)
info = stempeg.Info(stem_path)
S, _ = stempeg.read_stems(stem_path, info=info)

process_list = [S[0], S[1], S[2], S[3], S[4]]
for audio_to_process in process_list:
    y_stretched = timeStretch(audio_to_process, rate=R)
    stretched_list.append(y_stretched)

# create and save stem
S = np.array(stretched_list)
S = np.swapaxes(S,1,2) #n x samples x channels
stempeg.write_stems(S, output_mp4, rate=SR)

Do you have any idea on what could be the problem here? Thanks a lot!

Issue Analytics

State:
Created 4 years ago
Comments:10 (6 by maintainers)

Top GitHub Comments

2reactions

f90commented, Jul 18, 2019

There might be some problem with the encoding, seeing that the mean absolute deviation is not so high but the maximum one is. So it might be alright overall but locally some encoding inconsistencies produce a high error…

Solution 1: Export your audio to wave, and modify the MUSDB data loading code to load the wave files directly, then you know there should be absolutely no deviation between sum of sources and mix as you don’t have any encoding inaccuracy.

Solution 2: If you are absolutely sure you are inputting “proper” data into the system, go ahead and ignore the warning and/or use output_type: direct in the Wave-U-Net to allow it to output all sources unconstrained, so it is capable of outputting sources that do NOT add up to the original mix as well. I would definitely listen to the dataset you produced in this case though to make sure everything is alright.

I am clipping the accompaniment audio just to be sure that i don’t generate values out of the [-1,1] range since it’s a sum of the individual audio signals, so the amplitudes are summed up. Should not be necessary if the dataset is proper, but doesn’t hurt either.

1reaction

f90commented, Jul 18, 2019

I am not too well versed on the ffmpeg part of the story, but I personally wouldn’t trust it to encode things to such a high degree of accuracy that we require. I also had some issues when loading encoded audio in terms of synchronisation where the audio was suddenly misaligned in time, which is obviously very bad in our setting.

But yeah it looks like ffmpeg encoding (settings) is to blame here. I decode all the stems to wave as part of data preparation anyway as it’s much faster to load the audio during training that way, so you should probably use solution 1 I proposed and cut out the whole stempeg part completely.

Another solution if time-stretching is not too cpu-intensive is to put it as part of the data augmentation piopeline on-the-fly during training. Saves disk space but might slow down training since batches take longer to be prepared.

Top Results From Across the Web

Data augmentation - Wikipedia

Data augmentation in data analysis are techniques used to increase the amount of data by adding slightly modified copies of already existing data...

Data augmentation | TensorFlow Core

This tutorial demonstrates data augmentation: a technique to increase the diversity of your training set by applying random (but realistic) transformations, ...

What is Data Augmentation? Techniques & Examples in 2023

Data augmentation is a set of techniques to artificially increase the amount of data by generating new data points from existing data.

The Essential Guide to Data Augmentation in Deep Learning

Data augmentation is a process of artificially increasing the amount of data by generating new data points from existing data. This includes adding...

Data Augmentation in Python: Everything You Need to Know

Data augmentation is a technique that can be used to artificially expand the size of a training set by creating modified data from...