question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DB migration on BIG databases will take several days...

See original GitHub issue

I understand that having 35k entries is gonna take a while, but is it really efficient to completely re-write file to disk in 4KB steps?

I take it every single item gets individually saved by saving the entire new db file anew? (idk, that’s how it looks like if you inspect the file in Windows Explorer going from old size to 0kb to new size in 4KB increments.

Also I noticed that the file doesn’t get new writes anymore after I accidentally updated the docker container before the migration was done. I can tell by only having 30k out of an expected 35k items in the file.

Another thing I wonder is… Since we have to expect instances growing in size quite quickly… wouldn’t the flexibility of a relational database (relations could become big like “is a second part video to video ID xyz” and stuff like that with the better scalability make a lot of sense for this application at this point? Also mass-adding fields would not require rewriting x amounts of chunks of text to one file where x = amount of items. This is, as I’m currently witnessing, a scalability nightmare.

For debugging info: my db.json ends with this info:

  "files_to_db_migration_complete": false,
  "categories": [],
  "simplified_db_migration_complete": true

So it knows the migration is not done yet, but it has done a simplified migration completely? (what’s the difference here btw?)

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:20 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
Tzahi12345commented, Jul 19, 2021

Thanks for the kind words 😃

The simple answer as to why I chose MongoDB is that I don’t have to do any work to define the fields within a table. I can’t really speak for MariaDB but I’ve worked with Postgres and SQL Server. Half of this is pure laziness and the other half is that there’s still a table with nested properties, so it helps maintain some level of portability with the existing code. That, and I’ve already messed around with Firebase and MongoDB seemed really similar.

This affects performance a bit but with indexing this can apparently be mitigated (just found this out!) As you can probably tell, I’m not super experienced with MongoDB either, just know that it plays really well with JSON-type data structures.

The way I set up the DB, it’s still relational in an abstract sense – subscription videos aren’t stored in the subscription anymore, now they just have a “foreign key” called sub_id. Playlist videos are still stored as an array in a playlist object called uids, so this structure isn’t universal, mostly due to the fact that playlist videos are ordered and it’s a many-to-many relationship.

So to avoid any problems with a true relational DB, I just went with Mongo. I already noticed a speedup vs. using a local DB, but I’ll do some actual testing and get some numbers to prove it. I’ll work on the indexing stuff to make it even more performant, hopefully I can merge this in by Wednesday or so and you can let me know how it works for you. Let me know if you have any other questions! Having a second set of eyes on the backend stuff is always helpful.

1reaction
Tzahi12345commented, Jul 19, 2021

Good news @GlassedSilver, I’ve spent the last several weeks porting over all the code to support MongoDB, as well as a local DB solution which will be the default. I’ll let you know when this PR is merged but it should fix all your issues and make your instance wayyy quicker. Took a while and involved a lot of cleanup but it’s worth it!

I’m going to include it in my concurrent streams PR (#378). I’ll comment here when that’s merged. Want to do some final testing and update the PR (since it’s way bigger now), and it does involve another DB migration but it should be the last for a while.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Database Migration: How Long Should it Take? - Precisely
Database migration takes time. But how long should you expect a database migration to last? Explore the influencing factors in this article.
Read more >
Database Migration—What Do You Need to Know Before You ...
The refactoring process can take anywhere from a few weeks to several months, depending on the complexity of your application and database.
Read more >
Database migration: Concepts and principles (Part 1)
A homogeneous database migration is a migration between the source and target databases of the same database technology, for example, migrating ...
Read more >
SQL Server database migration best practices for low risk and ...
In this article, I will list commons steps for SQL Server database migration, including: migration requirements, SSRS Migration and more.
Read more >
What is Database Migration and How to Do it Properly
The problem is that these processes are notoriously difficult—some database migrations take days to complete, and most require some data ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found