Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Bug in wave normalization.

See original GitHub issue

Hi, thank you for sharing this awesome work.

I’ve encountered this code below, and I dont’ think this is correct.

crepe/core.py

    frames = as_strided(audio, shape=(1024, n_frames),
                        strides=(audio.itemsize, hop_length * audio.itemsize))
    frames = frames.transpose()

    # normalize each frame -- this is expected by the model
    frames -= np.mean(frames, axis=1)[:, np.newaxis]
    frames /= np.std(frames, axis=1)[:, np.newaxis]

This code is meant to normalize wave by each frame, but numpy’s as_strided function does not allocate new memory, but does just create a new view of the array. (for example, the memory of first frame and the memory of second frame is not separated.)

See numpy documentation for more details.

So when do “in-place” calculation on that array, unintented result may raise. Here is an example.

np.random.seed(42)
model_srate,step_size = 16000, 10
audio = np.random.randn(16000) # (16000, )

# below code is from capre/core.py#get_activation
hop_length = int(model_srate * step_size / 1000)
n_frames = 1 + int((len(audio) - 1024) / hop_length)
frames = as_strided(audio, shape=(1024, n_frames),
                    strides=(audio.itemsize, hop_length * audio.itemsize))
frames = frames.transpose()

# normalize each frame -- this is expected by the model
frames -= np.mean(frames, axis=1)[:, np.newaxis]
frames /= np.std(frames, axis=1)[:, np.newaxis]

print(frames.sum())

146.7856165123937

Since frame is normalized, its sum should be about zero. but it’s not.

To avoid this behavior, target array should be copied or something, to do operation correctly as intended. here is an example. only one line is changed. (.copy())

np.random.seed(42)
model_srate,step_size = 16000, 10
audio = np.random.randn(16000) # (16000, )

# below code is from capre/core.py#get_activation
hop_length = int(model_srate * step_size / 1000)
n_frames = 1 + int((len(audio) - 1024) / hop_length)
frames = as_strided(audio, shape=(1024, n_frames),
                    strides=(audio.itemsize, hop_length * audio.itemsize))
frames = frames.transpose().copy()

# normalize each frame -- this is expected by the model
frames -= np.mean(frames, axis=1)[:, np.newaxis]
frames /= np.std(frames, axis=1)[:, np.newaxis]

print(frames.sum())

-2.384759056894836e-13

This code is tested with numpy version 1.17.4

Issue Analytics

State:
Created 4 years ago
Comments:7 (4 by maintainers)

Top GitHub Comments

1reaction

jongwookcommented, Mar 26, 2020

@adarob @justinsalamon Thanks for the heads up! Just pushed v0.0.11 😃

Btw you can use:

git+https://github.com/marl/crepe.git

in requirements.txt to pull the latest commit from Github, or:

git+https://github.com/marl/crepe.git@7f0bf0593847f38df47d2760934d585ad2f618c0

for a specific commit.

0reactions

HudsonHuangcommented, Jul 28, 2020

Just a quick question, was the model trained on audio normalized the old / incorrect way?

I am using a PyTorch port of this repo with your weights, and am wondering whether I should change or update the normalization code? Or if the model requires re-training with the fixed normalization?

@angusturner Hi, I‘m looking for the pytorch version of CREPE, but found nothing. Does the “PyTorch port” you mentioned is public available? or would you mind about releasing it? Thank you very much in advance.