Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Improve LockService functionality and flexibility

See original GitHub issue

Description

Because concurrent update/rollback/etc. operations to a database will interfere with each other, Liquibase has a LockService which we use to coordinate access so only one instance can run against a database at time.

Examples of when you can run into concurrency trouble:

Liquibase is embedded on startup of an application server, and you have a cluster of machines that may be starting at once
Liquibase runs as part of a build job, and concurrent builds can happen at once

The StandardLockService is the normal implementation that uses a databasechangeloglock table. Liquibase inserts a record and commits it at startup time, and if other instances see that record they wait until it is cleared.

The big problem with this standard implementation is that IF the instance doing the update is killed before it commits the clearing of the record, that value is stuck until liquibase clearLocks is ran.

This problem has been around forever in Liquibase, but has been exasperated in recent years with systems like AWS that auto-kill processes that seem slow to start up. We wrap everything in try/catch blocks, but process killing happens too out-of-band for us to deal with.

Related issues/PRs

Descriptions and proposed fixes:

#829
#1417
#1311
Others?

Moving the overall discussion and design to this ticket.

Requirements (New Configuration Options)

Add a liquibase.lockservice.enabled global configuration value that can be used to disable lock service completely. This will allow easier “BYO Locking” (see options section below)
Add a liquibase.lockservice.heartbeat.enabled global configuration value that can be used to enable the heartbeat functionality. For the first release, set this to “false” as a default. Future release will change the default to “true”
Add a liquibase.lockservice.heartbeat.rate global configuration value that can be used to control how often to update the heartbeat column. Value is seconds, with a default of “10”.
Add a liquibase.lockservice.heartbeat.timeout global configuration value that can be used to control how long to wait until it a separate process can take the lock. Value is in seconds, with a default of “30”.

Requirements (Add Heartbeat support)

Update StandardLockService to use a heartbeat thread to update the heartbeat column. (See “options” section below)

Existing databasechangelog lock tables should be auto-updated to have an additional, nullable “heartbeat” varchar(255) column.
Heartbeat thread must gracefully handle errors, and if the thread stops unexpectedly it should cause the overall liquibase operation to stop.
The heartbeat will update the column to a random number/string based on the rate. The waiting process will watch for changese to this value rather than comparing dates in the heartbeat column to avoid date arithmetic and/or system time issues.
The SQL to take the lock should take the last heartbeat value they knew of into account and check how many rows are updated. This will handle the case where multiple waiting processes try to force an unlock concurrently. 1.The waiting process cannot take the lock if the heartbeat column is null. This supports the “heartbeat.enabled=false” configuration as well as older version of liquibases

Requirements (General refactoring)

While we are adding this feature, we will also do our 4.x work of reviewing the LockService interface in general to make it more extensible and manageable. This makes it easier to have a single release note of “We changed the lock service to support heartbeats and is also easier to work with”. The changes will be an API breaking change for extensions that wrote their own LockService, but we will include documentation on how to update their code.

Refactor LockService API to be in line with new 4.x standards (use Scope, extend PluginService, etc.)
Refactor/centralize the databasechangeloglock table management into the StandardLockService rather than being scattered across special SqlGenerators etc.

Test Criteria

Running liquibase concurrently with a running update/rollback will wait until the other process is done.
If the original liquibase process is killed half way through, the 2nd process will take the lock and do the update
Lock logic works with liquibase.lockservice.enabled=false on both processes
Lock logic works with liquibase.lockservice.enabled=false on only one process
Running update on concurrent processes with liquibase.lockservice.enabled=false will not cause one to wait for the other

Not doing

Not doing anything database specific. If we find that there are cases not handled with the heartbeat logic, we can consider database-specific syntax as a future enhancement.

Options Considered

BYO Locking

One option is to leave the locking to the users. There are many situations where Liquibase is ran within a system where the user knows only one instance can be ran at a time. For example, cloud providers have an “init” phase where the platform ensures just one container runs and Liquibase can be moved to there. Or, updates are done via a managed deployment script or build process which is ensuring only one version of an app is being deployed at once.

In these cases, the lock service is just getting in the way. Ideally, the default lock service should work just fine and whether it runs unnecessarily or not isn’t worth the hassle of turning it off. But, it’s a fallback option to consider. We do have the nodatabaselock extension which could be folded into the main liquibase code with an easy flag to switch to that implementation

Database Session Based Locking

Databases have ways to lock tables for the duration of a connection that automatically unlocks them when the connection closes.

By using that functionality, we can rely on the database to clean up the lock. We don’t even necessarily need the databasechangeloglock table anymore, because we can lock the databasechangelog table itself.

However, this approach is going to be specific to each database since the syntax is different for each database. The semantics of what a “lock” means and the visibility of it also varies by database, so it will require extensive testing to ensure it works as expected everywhere.

A native lock system may break the “lock has been held by X since Y” tracking we can do with the databasechangeloglock table, since that native lock ownership may not be visible to other connections. But, we could still preserve the databasechangelog table for reporting purposes even if we don’t use it for locking purposes anymore.

Examples of lock syntax:

Postgresql: pg_try_advisory_lock https://www.postgresql.org/docs/9.1/functions-admin.html
Mysql: lock/unlock tables https://dev.mysql.com/doc/refman/8.0/en/lock-tables.html
Oracle: lock/unlock tables: https://docs.oracle.com/javadb/10.8.3.0/ref/rrefsqlj40506.html
MSSQL: select with tablock https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-table?view=sql-server-ver15

Heartbeat

Rather than relying on a native database approach, we could have a thread within the LockService update the databasechangeloglock table every 30 seconds. If a separate process sees that the heartbeat is over 30 seconds old, it can take the lock.

This avoids the database specific logic, but also requires threads which some application servers do not like creating. But, maybe that is OK still?

Max Lock Time

There have been some suggestions to have a setting where the lock attempt succeeds if the old lock is older than X seconds. I tend to not like this because sometimes updates (especially the first run or when creating indexes on a large table) can just take a long time. To avoid this, the “take lock” has to be long. But, a long enough take-lock setting doesn’t help the use case of cloud providers auto-killing processes and wanting to restart a new one right away. For example, a 30 minute setting may sometimes not be long enough and the lock can be incorrectly taken while at the same time way too long for people to wait for their auto-restart logic to take the lock.

Other options?

Any other ideas on how we could implement the logic?

Issue Analytics

State:
Created 3 years ago
Reactions:5
Comments:10 (7 by maintainers)

Top GitHub Comments

6reactions

RichardBradleycommented, Nov 4, 2021

The plugin https://github.com/blagerweij/liquibase-sessionlock (released mid 2020) now implements your option “Database Session Based Locking”, and does it very well. It works for MySQL, Postgresql and Oracle. Personally, I think it is a much better solution than a timeout based approach, and the price of writing database specific adapters for it is worth paying. (In a way, isn’t one of the primary goals of Liquibase to write database specific adapters for common operations?). In my opinion it would be nice to bring that into core as it is strictly better than the standard lock.

If you are looking at locking issues, I think double checked locking is also worth pursuing. See my comments on #829 and #2105

Thanks for Liquibase, I think it’s a great project.

1reaction

kataggartcommented, Sep 15, 2022

@stdmitry check out https://github.com/liquibase/liquibase/pull/2190; people are actively working on a resolution that should address your use case (and sorry your application has to get restarted over and over again… sounds a bit painful)

Top Results From Across the Web

BEST - Door Hardware & Access Control Solutions

BEST commercial door hardware products and access control solutions set the standard for security, durability, compatibility and ease of installation.

Trend Micro Safe Lock Installation Guide

This documentation introduces the main features of the product and/or provides installation instructions for a production environment. Read through the ...

Open & Lock Services | Tailored, Cost-Effective ... - Arm Secure

Our Open and Lock Services allows us to Open up your premises at a chosen time and Lock back up again when you...

Distributed locking in .NET - Stack Overflow

Because the underlying SqlServer functionality is very flexible, there are also overloads supporting TryAcquire semantics, timeouts, and async locking.

Southern Lock & Supply Co.

Access control, door hardware, locksmith tools, and supplies at wholesale prices. Access control, locksmith tools, key blanks, key cabinets, key machines, ...