Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[MusicBERT] Restriction to 1002 octuples when using `preprocess.encoding_to_str`

See original GitHub issue

Hi once again!

While preprocessing a MIDI file, I noticed that the MIDI_to_encoding method performs as intended and converts the sample song to 106 bars as seen in the snip below of the resultant octuples (please correct me if I’m wrong).

However, the encoding_to_str method has the result with restriction to just 18 bars (as conculsive from highlighted <0-18> near the end of the encoded string in snip below):

More generally, what I have noticed in cases of multiple MIDI files is that only upto the first 1000 octuples (i.e, start token octuple + 1000 note octuples + end token octuple = (1002 * 8) = 8016 tokens) are considered:

Is there any way to change encoding_to_str to get the whole song instead?, upto 256 bars only I mean, as model vocabulary is also restricted to 256 bars. I am not familiar enough with miditoolkit or mido to understand the code properly as of now, else I would have tried to fix this.

Thanks in advance!

Edit: I am aware that the musicbert_base model can support upto 8192 octuples (i.e, final input to MusicBERT encoder) only, but that does not seem to be the issue here I think.

Issue Analytics

State:
Created a year ago
Comments:6

Top GitHub Comments

1reaction

mlzengcommented, Jul 5, 2022

Hi @tripathiarpan20

Your logic is pretty correct. But MusicBERT models are trained with setting TOKENS_PER_SAMPLE=8192 (as seen in train_mask.sh) which means the length of input sequences would not exceed 8192 tokens (= 1024 octuples), and the attention layers in the encoder will only get tensors with length no more than 1024.

Processing 8192 octuple tokens with MusicBERT is theoretically possible, but that will require 64 times more GPU memory (memory usage is quadratic proportional to sequence length), which is non-practical currently. (The original RoBERTa models are trained with sequence length = 512)

0reactions

tripathiarpan20commented, Jul 6, 2022

Oh I see, hoping it is resolved eventually.

Would the finetuned models on the genre prediction and accompaniment suggestion be released too? I’m thinking of implementing a genre based task and the finetuned model would be a great starting point.