[Discussion] Spleeter for real-time applications
See original GitHub issueHi ! This issue is a follow-up of a gitter discussion. I’m developing a C++ port of spleeter here as a side project. My goal is to give people the opportunity to use the spleeter technology within plug-ins. Lately, I realized that the architecture is built to run in batches of size ‘T’ which is 512 fft frames in the pre-trained models (~12seconds). This kind of latency isn’t suitable for real-time processes. To check if the architecture is suitable for my needs, I need to evaluate the quality loss of changing that value to something lower. My plan is to train models for 2 and 4 stems with multiple values of T and compare their quality.
I created a repository to report on that, as I assume it may be of interest to others. And also to get some help in the process.
I already have a couple of questions:
First:
I am using MUSDB18. I noticed that the training configuration needs the description of the evaluation set. Considering that I will be using that set to evaluate the trained models, I would like to make sure that training does not use the validation set as it would mean that I am evaluating on data the network already knows. Is the evaluation_csv
parameter used during training ?
Second:
I am struggling with computation power. For my first test, it took almost 4hours to run 100 000 steps on a p2.xlarge
AWS instance (GPU Tesla k80). Is this expected ? Do you think that, considering my problem I could lower the train_max_steps
?
Third:
This one is more of a note than a discussion but I ran the evaluation using python -m spleeter evaluate -p spleeter:2stems --mus_dir [exported db dir] -o [local path]
and got NaNs. Is the evaluation system broken atm ? I changed it in a fork here to fix it for my case. Should I do a pull request for that ?
Thank you for your help and for publishing your work !
Issue Analytics
- State:
- Created 4 years ago
- Comments:43 (1 by maintainers)
Hi there, just wanted to follow up on that matter. I followed @romi1502 's suggestion and successfully implemented a very simple volume control VST3 plugin running in real time with spleeter ! You can find the plug-in code right here. I also provide a pre-build for OSX (tested on 10.14 and 10.15) here.
As expected the latency is the worst. 64 frames for spleeter and a couple of extra frames to leave enough time for the process to run properly (setup to 10 on the pre-built if I’m not mistaking). That leads to a latency close to 2seconds… But still, playing with those sliders is so much fun ! 😃
As a side note, I didn’t release a new spleeterpp version that includes the on-line processing yet (the code is available in the develop branch though). I still need to update the documentation with details about the algorithm. There are quite a few parameters after all. I also need to assert the equality with the classic process.
Anyway, thank you once again for releasing your work. If you ever have further suggestions to improve this integration, I’d be very happy to read them !
S1 should be OK. Accordingly to https://github.com/deezer/spleeter/blob/master/configs/5stems/base_config.json#L8 , the FFT size is 4096, so the latency should be 4096sa or 4096/44100 = 92ms. Can you explain how you got it to 8ms?
I;m not experienced with PDC in juice/c++, but in JSFX, PDC is very manual. You set the PDC & the DAW will give you x sa in advance, and you delay your output, and give out 1 sa at a time.
I don’t think it’s a revolution, more like workflow enhancement. BTW, izotope had a AAX “RX7 music rebalance” in 2018 & VST "Ozone 9 master rebalance " in 2019.