DB migration on BIG databases will take several days...
See original GitHub issueI understand that having 35k entries is gonna take a while, but is it really efficient to completely re-write file to disk in 4KB steps?
I take it every single item gets individually saved by saving the entire new db file anew? (idk, that’s how it looks like if you inspect the file in Windows Explorer going from old size to 0kb to new size in 4KB increments.
Also I noticed that the file doesn’t get new writes anymore after I accidentally updated the docker container before the migration was done. I can tell by only having 30k out of an expected 35k items in the file.
Another thing I wonder is… Since we have to expect instances growing in size quite quickly… wouldn’t the flexibility of a relational database (relations could become big like “is a second part video to video ID xyz” and stuff like that with the better scalability make a lot of sense for this application at this point? Also mass-adding fields would not require rewriting x amounts of chunks of text to one file where x = amount of items. This is, as I’m currently witnessing, a scalability nightmare.
For debugging info: my db.json ends with this info:
"files_to_db_migration_complete": false,
"categories": [],
"simplified_db_migration_complete": true
So it knows the migration is not done yet, but it has done a simplified migration completely? (what’s the difference here btw?)
Issue Analytics
- State:
- Created 3 years ago
- Comments:20 (6 by maintainers)
Thanks for the kind words 😃
The simple answer as to why I chose MongoDB is that I don’t have to do any work to define the fields within a table. I can’t really speak for MariaDB but I’ve worked with Postgres and SQL Server. Half of this is pure laziness and the other half is that there’s still a table with nested properties, so it helps maintain some level of portability with the existing code. That, and I’ve already messed around with Firebase and MongoDB seemed really similar.
This affects performance a bit but with indexing this can apparently be mitigated (just found this out!) As you can probably tell, I’m not super experienced with MongoDB either, just know that it plays really well with JSON-type data structures.
The way I set up the DB, it’s still relational in an abstract sense – subscription videos aren’t stored in the subscription anymore, now they just have a “foreign key” called
sub_id
. Playlist videos are still stored as an array in a playlist object calleduids
, so this structure isn’t universal, mostly due to the fact that playlist videos are ordered and it’s a many-to-many relationship.So to avoid any problems with a true relational DB, I just went with Mongo. I already noticed a speedup vs. using a local DB, but I’ll do some actual testing and get some numbers to prove it. I’ll work on the indexing stuff to make it even more performant, hopefully I can merge this in by Wednesday or so and you can let me know how it works for you. Let me know if you have any other questions! Having a second set of eyes on the backend stuff is always helpful.
Good news @GlassedSilver, I’ve spent the last several weeks porting over all the code to support MongoDB, as well as a local DB solution which will be the default. I’ll let you know when this PR is merged but it should fix all your issues and make your instance wayyy quicker. Took a while and involved a lot of cleanup but it’s worth it!
I’m going to include it in my concurrent streams PR (#378). I’ll comment here when that’s merged. Want to do some final testing and update the PR (since it’s way bigger now), and it does involve another DB migration but it should be the last for a while.