[Discussion] Custom Model does not separate at all.
See original GitHub issueHi all,
I’ve posted here a few times, this spleeter thing isn’t a walk in the park! BUT, I have managed to almost get there in the end. I just need help training a model, all my models don’t seem to work.
A little backstory about my use case - I’m trying to adapt spleeter to separate dialogue from films. In the Fan Edit community, this is basically the hardest part of editing release material. Some films and shows have a ‘clean’ centre channel, with only dialogue, and this makes life super easy, most do not. If I can figure this out, it will basically revolutionise fan edits and make edits previously impossible due to music bleed, possible.
I have compiled a collection of shows and movies I could get my hands on with 100% clean centre channels for audio dialogue. No music bleed. This was extremely time consuming! With the idea I could train a model that output “vocals” and “other”.
I ended up with 160 odd episodes and 3 films. Roughly 90 hours or so of raw data.
Steps so far:
- Using ffmpeg, I took the 5.1 Audio from the films/episodes. This gave me 6 channels:
[FL][FR][FC][LFE][BL][BR]
where the centre channel was entirely dialogue and no music. I verified this for every data point. - Using ffmpeg I created a mono track that downmixed the entire 5.1 audio to 1x mono. Giving me dialogue, music and effects.
- I organised the config files as instructed. With a .csv file with “mix_path” containing the mono full downmixed film audio, “vocal_path” containing the centre channel audio and “other_path” containing a channel with the music mix, and edited the config.json with the new paths and my data directories.
- I ran
spleeter train -p ~/Configs/filmModel.json -d ~/filmDataSet
. This gave me:
INFO:spleeter:Start model training
INFO:spleeter:Model training done
And this output the model into the directory “filmModel” (as the config file told it to). This took a suspiciously small amount of time for 90 hours of audio…
- I then used
spleeter separate -i "~\Testfilm.mp4" -o Test -p "~\Configs\filmModel.json"
- It did its thing and ouput:
INFO:spleeter:File ~/vocals.wav written succesfully
INFO:spleeter:File ~/other.wav written succesfully
This also took a very short amount of time, but this was less suspicious as the test file I was using is only 1m50s.
- I went to audacity and dragged my two files in and they are practically identical.
- Tested with
spleeter:2stems
on the same~\Testfilm.mp4
and it does output music and vocals (but it’s not that good).
Am I training correctly? I can’t seem to figure out why my model doesn’t work! Anyone have any advice, tips or things to try?
One thing that did come to mind was that the length of these files might be too much for spleeter. I ran into issues trying to use spleeter separate
on lengthy files, spleeter threw tons of errors. I currently use a 1m50s sample file of a film not in my training data, this does not throw any errors and works fine with spleeter:2stem
. No errors are thrown for training my custom model, but could this be a possible issue?
Issue Analytics
- State:
- Created 3 years ago
- Comments:14
This feature already works without any special tuning for specific scenarios. Please try the default models. See https://github.com/deezer/spleeter/issues/411#issuecomment-638096967
if you’re training seems to be skipping - that’s designed on purpose using the cache. INFO:spleeter:Start model training INFO:spleeter:End model training
as a sanity check - to redo training perform some clean up / specifically removing this cache folder!
clean.sh // chmod +x
For some unknown reason - I have to move the trained model jp_model from root folder into pretrained_models otherwise - it attempts to download file from internet.