question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Reliable backups for high-activity pad

See original GitHub issue

Hi,

We have a production instance of etherpad-lite 1.6.1 for a nonprofit, which apart from being used normally with many pads, has a specific pad that has a lot of activity and history, as it is used kind of as a board of things to do/in progress/recently done and is updated many times every day by several members of the org, and it is hardly ever deleted and recreated.

This makes it a pad that comes to have tens of thousands of revisions after a few months. I’m quite sure this is not really what etherpad-lite was designed for, but “unfortunately” the org members like this way of working very much and are very used to it and we’ve still not found a better tool.

We’ve already had several catastrophes with this specific pad due to kernel panics, unclean shutdowns, mysql restarts (mostly for sec upgrades) without stopping etherpad first, corrupt changesets that lead to the high-activity pad being unreadable (fails with Failed assertion: Invalid changeset (checkRep failed) client-side). Other lower activity pads on the instance seem to cope with those events quite nicely though. Additionally, some members sometimes have a faulty connection that causes their browser to reconnect very often and I wonder if that doesn’t generate even more revisions to fade their author color each time. That’s a secondary problem, but if it’s the case then it also makes the pad history grow even faster and increases the chances of failures as I perceive it.

Restoring attempts for this pad usually includes:

  • calling for a member to not close his pad browser tab and copy-paste the html somewhere
  • running checkPad.js / repairPad.js which takes ages due to the huge history and does not change the situation
  • trying to get a proper backup using getText/getHTML API methods (calls always fail)
  • trying to restore a working version using restoreRevision/copyPad API methods (both calls succeed but copyPad takes ages and just creates a 2nd nonworking pad)
  • trying to call deletePad via the API (takes ages and fails)
  • remove pad rows from database manually and recreate the pad from a copy-paste of the HTML, losing both history and authorship

That situation led me to try and setup frequent backups of the whole instance. I’m actually doing hourly mysql dumps right now (at least for the last 24h). Unfortunately I discovered that restoring those backups also lead to a nonworking checkRep failed pad. Which led me to believe that doing mysql dumps actually produces a faulty database image unless etherpad is stopped.

I would have used the API to make backups but after a few weeks/months of activity the API calls just take longer than the backup interval. And stopping the instance every hour to run mysqldump would be quite disruptive.

So here are my questions:

  • is there a way to tell a running etherpad to somehow flush everything to database so that a mysqldump has a higher chance of being usable ?
  • is there a better way of backing up pads in a way that is automatable and preserves authorship upon restore (history would be nice but not mandatory here)
  • is there something I can do to somehow “fix” the faulty mysql dumps ?

Here is a faulty mysqldump for reference. The high-activity pad ID is “affaires-courantes”. All our activity and pads are public so there’s no risk of disclosing personal/secret information here.

Thanks 😃

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:2
  • Comments:13 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
JohnMcLearcommented, Jun 9, 2020

As #3991 is merged this gives us an awesome tool for recoveries I can go ahead and close this. If we get another report we should be able to recover upon request and now the tools are available to debug/diagnose and recover.

0reactions
JohnMcLearcommented, May 11, 2020

As per #3991 I think to do a restoration/rebuild you need these values else changeset ops wont work. Once the merge is complete I can continue work on my script/branch but it looks like any pads with revs(@100) edited before the merge is complete and in place wont be able to actually be rebuild rendering both existing methods pointless.

Read more comments on GitHub >

github_iconTop Results From Across the Web

The Best Online Cloud Backup Service - The New York Times
The best online backup service. Backblaze is economical, reliable, and easy to set up. Buying Options. $70 from Backblaze ( ...
Read more >
The Best Backup Software and Services for 2022 - PCMag
The Best Backup Software and Services for 2022 ; ShadowProtect SPX Desktop. Best for Reliable Disk Imaging. 4.5 Outstanding ; IDrive. Best for...
Read more >
Best Online Backup for External Hard Drives in 2022
Best Cloud Backup for External Hard Drive 2022 ; 1. IDrive Logo www.idrive.com · 250 GB - 20 TB · $4.97 ; 2....
Read more >
How to back up your photos and videos the right way - Input
Backing up your photos and videos isn't hard once you know what to do. From hard drives to local backups to the cloud,...
Read more >
Carbonite: Cloud backup solutions for home and business
Cloud backup software from Carbonite helps protect your personal & business data from common forms of data loss. Try Carbonite back storage by...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found