question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Improve LockService functionality and flexibility

See original GitHub issue

Description

Because concurrent update/rollback/etc. operations to a database will interfere with each other, Liquibase has a LockService which we use to coordinate access so only one instance can run against a database at time.

Examples of when you can run into concurrency trouble:

  • Liquibase is embedded on startup of an application server, and you have a cluster of machines that may be starting at once
  • Liquibase runs as part of a build job, and concurrent builds can happen at once

The StandardLockService is the normal implementation that uses a databasechangeloglock table. Liquibase inserts a record and commits it at startup time, and if other instances see that record they wait until it is cleared.

The big problem with this standard implementation is that IF the instance doing the update is killed before it commits the clearing of the record, that value is stuck until liquibase clearLocks is ran.

This problem has been around forever in Liquibase, but has been exasperated in recent years with systems like AWS that auto-kill processes that seem slow to start up. We wrap everything in try/catch blocks, but process killing happens too out-of-band for us to deal with.

Related issues/PRs

Descriptions and proposed fixes:

Moving the overall discussion and design to this ticket.

Requirements (New Configuration Options)

  1. Add a liquibase.lockservice.enabled global configuration value that can be used to disable lock service completely. This will allow easier “BYO Locking” (see options section below)
  2. Add a liquibase.lockservice.heartbeat.enabled global configuration value that can be used to enable the heartbeat functionality. For the first release, set this to “false” as a default. Future release will change the default to “true”
  3. Add a liquibase.lockservice.heartbeat.rate global configuration value that can be used to control how often to update the heartbeat column. Value is seconds, with a default of “10”.
  4. Add a liquibase.lockservice.heartbeat.timeout global configuration value that can be used to control how long to wait until it a separate process can take the lock. Value is in seconds, with a default of “30”.

Requirements (Add Heartbeat support)

Update StandardLockService to use a heartbeat thread to update the heartbeat column. (See “options” section below)

  1. Existing databasechangelog lock tables should be auto-updated to have an additional, nullable “heartbeat” varchar(255) column.
  2. Heartbeat thread must gracefully handle errors, and if the thread stops unexpectedly it should cause the overall liquibase operation to stop.
  3. The heartbeat will update the column to a random number/string based on the rate. The waiting process will watch for changese to this value rather than comparing dates in the heartbeat column to avoid date arithmetic and/or system time issues.
  4. The SQL to take the lock should take the last heartbeat value they knew of into account and check how many rows are updated. This will handle the case where multiple waiting processes try to force an unlock concurrently. 1.The waiting process cannot take the lock if the heartbeat column is null. This supports the “heartbeat.enabled=false” configuration as well as older version of liquibases

Requirements (General refactoring)

While we are adding this feature, we will also do our 4.x work of reviewing the LockService interface in general to make it more extensible and manageable. This makes it easier to have a single release note of “We changed the lock service to support heartbeats and is also easier to work with”. The changes will be an API breaking change for extensions that wrote their own LockService, but we will include documentation on how to update their code.

  1. Refactor LockService API to be in line with new 4.x standards (use Scope, extend PluginService, etc.)
  2. Refactor/centralize the databasechangeloglock table management into the StandardLockService rather than being scattered across special SqlGenerators etc.

Test Criteria

  1. Running liquibase concurrently with a running update/rollback will wait until the other process is done.
  2. If the original liquibase process is killed half way through, the 2nd process will take the lock and do the update
  3. Lock logic works with liquibase.lockservice.enabled=false on both processes
  4. Lock logic works with liquibase.lockservice.enabled=false on only one process
  5. Running update on concurrent processes with liquibase.lockservice.enabled=false will not cause one to wait for the other

Not doing

  1. Not doing anything database specific. If we find that there are cases not handled with the heartbeat logic, we can consider database-specific syntax as a future enhancement.

Options Considered

BYO Locking

One option is to leave the locking to the users. There are many situations where Liquibase is ran within a system where the user knows only one instance can be ran at a time. For example, cloud providers have an “init” phase where the platform ensures just one container runs and Liquibase can be moved to there. Or, updates are done via a managed deployment script or build process which is ensuring only one version of an app is being deployed at once.

In these cases, the lock service is just getting in the way. Ideally, the default lock service should work just fine and whether it runs unnecessarily or not isn’t worth the hassle of turning it off. But, it’s a fallback option to consider. We do have the nodatabaselock extension which could be folded into the main liquibase code with an easy flag to switch to that implementation

Database Session Based Locking

Databases have ways to lock tables for the duration of a connection that automatically unlocks them when the connection closes.

By using that functionality, we can rely on the database to clean up the lock. We don’t even necessarily need the databasechangeloglock table anymore, because we can lock the databasechangelog table itself.

However, this approach is going to be specific to each database since the syntax is different for each database. The semantics of what a “lock” means and the visibility of it also varies by database, so it will require extensive testing to ensure it works as expected everywhere.

A native lock system may break the “lock has been held by X since Y” tracking we can do with the databasechangeloglock table, since that native lock ownership may not be visible to other connections. But, we could still preserve the databasechangelog table for reporting purposes even if we don’t use it for locking purposes anymore.

Examples of lock syntax:

Heartbeat

Rather than relying on a native database approach, we could have a thread within the LockService update the databasechangeloglock table every 30 seconds. If a separate process sees that the heartbeat is over 30 seconds old, it can take the lock.

This avoids the database specific logic, but also requires threads which some application servers do not like creating. But, maybe that is OK still?

Max Lock Time

There have been some suggestions to have a setting where the lock attempt succeeds if the old lock is older than X seconds. I tend to not like this because sometimes updates (especially the first run or when creating indexes on a large table) can just take a long time. To avoid this, the “take lock” has to be long. But, a long enough take-lock setting doesn’t help the use case of cloud providers auto-killing processes and wanting to restart a new one right away. For example, a 30 minute setting may sometimes not be long enough and the lock can be incorrectly taken while at the same time way too long for people to wait for their auto-restart logic to take the lock.

Other options?

Any other ideas on how we could implement the logic?

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:5
  • Comments:10 (7 by maintainers)

github_iconTop GitHub Comments

6reactions
RichardBradleycommented, Nov 4, 2021

The plugin https://github.com/blagerweij/liquibase-sessionlock (released mid 2020) now implements your option “Database Session Based Locking”, and does it very well. It works for MySQL, Postgresql and Oracle. Personally, I think it is a much better solution than a timeout based approach, and the price of writing database specific adapters for it is worth paying. (In a way, isn’t one of the primary goals of Liquibase to write database specific adapters for common operations?). In my opinion it would be nice to bring that into core as it is strictly better than the standard lock.

If you are looking at locking issues, I think double checked locking is also worth pursuing. See my comments on #829 and #2105

Thanks for Liquibase, I think it’s a great project.

1reaction
kataggartcommented, Sep 15, 2022

@stdmitry check out https://github.com/liquibase/liquibase/pull/2190; people are actively working on a resolution that should address your use case (and sorry your application has to get restarted over and over again… sounds a bit painful)

Read more comments on GitHub >

github_iconTop Results From Across the Web

BEST - Door Hardware & Access Control Solutions
BEST commercial door hardware products and access control solutions set the standard for security, durability, compatibility and ease of installation.
Read more >
Trend Micro Safe Lock Installation Guide
This documentation introduces the main features of the product and/or provides installation instructions for a production environment. Read through the ...
Read more >
Open & Lock Services | Tailored, Cost-Effective ... - Arm Secure
Our Open and Lock Services allows us to Open up your premises at a chosen time and Lock back up again when you...
Read more >
Distributed locking in .NET - Stack Overflow
Because the underlying SqlServer functionality is very flexible, there are also overloads supporting TryAcquire semantics, timeouts, and async locking.
Read more >
Southern Lock & Supply Co.
Access control, door hardware, locksmith tools, and supplies at wholesale prices. Access control, locksmith tools, key blanks, key cabinets, key machines, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found