question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[MusicBERT] Restriction to 1002 octuples when using `preprocess.encoding_to_str`

See original GitHub issue

Hi once again!

While preprocessing a MIDI file, I noticed that the MIDI_to_encoding method performs as intended and converts the sample song to 106 bars as seen in the snip below of the resultant octuples (please correct me if I’m wrong).

However, the encoding_to_str method has the result with restriction to just 18 bars (as conculsive from highlighted <0-18> near the end of the encoded string in snip below):

image

More generally, what I have noticed in cases of multiple MIDI files is that only upto the first 1000 octuples (i.e, start token octuple + 1000 note octuples + end token octuple = (1002 * 8) = 8016 tokens) are considered:

image

Is there any way to change encoding_to_str to get the whole song instead?, upto 256 bars only I mean, as model vocabulary is also restricted to 256 bars. I am not familiar enough with miditoolkit or mido to understand the code properly as of now, else I would have tried to fix this.

Thanks in advance!

Edit: I am aware that the musicbert_base model can support upto 8192 octuples (i.e, final input to MusicBERT encoder) only, but that does not seem to be the issue here I think.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6

github_iconTop GitHub Comments

1reaction
mlzengcommented, Jul 5, 2022

Hi @tripathiarpan20

Your logic is pretty correct. But MusicBERT models are trained with setting TOKENS_PER_SAMPLE=8192 (as seen in train_mask.sh) which means the length of input sequences would not exceed 8192 tokens (= 1024 octuples), and the attention layers in the encoder will only get tensors with length no more than 1024.

Processing 8192 octuple tokens with MusicBERT is theoretically possible, but that will require 64 times more GPU memory (memory usage is quadratic proportional to sequence length), which is non-practical currently. (The original RoBERTa models are trained with sequence length = 512)

0reactions
tripathiarpan20commented, Jul 6, 2022

Oh I see, hoping it is resolved eventually.

Would the finetuned models on the genre prediction and accompaniment suggestion be released too? I’m thinking of implementing a genre based task and the finetuned model would be a great starting point.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Symbolic Music Understanding with Large-Scale Pre-Training
Experiments demonstrate the advantages of MusicBERT on four music understanding tasks, including melody completion, accompaniment suggestion, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found