[MusicBERT] Restriction to 1002 octuples when using `preprocess.encoding_to_str`
See original GitHub issueHi once again!
While preprocessing a MIDI file, I noticed that the MIDI_to_encoding method performs as intended and converts the sample song to 106 bars as seen in the snip below of the resultant octuples (please correct me if I’m wrong).
However, the encoding_to_str method has the result with restriction to just 18 bars (as conculsive from highlighted <0-18> near the end of the encoded string in snip below):

More generally, what I have noticed in cases of multiple MIDI files is that only upto the first 1000 octuples (i.e, start token octuple + 1000 note octuples + end token octuple = (1002 * 8) = 8016 tokens) are considered:

Is there any way to change encoding_to_str to get the whole song instead?, upto 256 bars only I mean, as model vocabulary is also restricted to 256 bars.
I am not familiar enough with miditoolkit or mido to understand the code properly as of now, else I would have tried to fix this.
Thanks in advance!
Edit: I am aware that the musicbert_base model can support upto 8192 octuples (i.e, final input to MusicBERT encoder) only, but that does not seem to be the issue here I think.
Issue Analytics
- State:
- Created a year ago
- Comments:6

Top Related StackOverflow Question
Hi @tripathiarpan20
Your logic is pretty correct. But MusicBERT models are trained with setting
TOKENS_PER_SAMPLE=8192(as seen intrain_mask.sh) which means the length of input sequences would not exceed 8192 tokens (= 1024 octuples), and the attention layers in the encoder will only get tensors with length no more than 1024.Processing 8192 octuple tokens with MusicBERT is theoretically possible, but that will require 64 times more GPU memory (memory usage is quadratic proportional to sequence length), which is non-practical currently. (The original RoBERTa models are trained with sequence length = 512)
Oh I see, hoping it is resolved eventually.
Would the finetuned models on the genre prediction and accompaniment suggestion be released too? I’m thinking of implementing a genre based task and the finetuned model would be a great starting point.