Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MySQL Persistence should retry on Deadlock

See original GitHub issue

Should mysql operation in withTransaction be wrapped inside RetryUtil.retryOnException so that issue like can be retried on the spot rather than bubble up all the way to WorkflowExecutor:

	at org.eclipse.jetty.server.Server.handle(Server.java:524) 
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:319) 
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:253) 
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273) 
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) 
	at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) 
	at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) 
	at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) 
	at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) 
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) 
	at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) 
	at java.lang.Thread.run(Thread.java:748) 
Caused by: com.netflix.conductor.core.execution.ApplicationException: BACKEND_ERROR - Deadlock found when trying to get lock; try restarting transaction 
	at com.netflix.conductor.dao.mysql.MySQLBaseDAO.getWithTransaction(MySQLBaseDAO.java:103) 
	at com.netflix.conductor.dao.mysql.MySQLBaseDAO.withTransaction(MySQLBaseDAO.java:152) 
	at com.netflix.conductor.dao.mysql.MySQLExecutionDAO.updateTask(MySQLExecutionDAO.java:137) 
	at com.netflix.conductor.core.orchestration.ExecutionDAOFacade.updateTask(ExecutionDAOFacade.java:250) 
	... 51 more 
Caused by: com.netflix.conductor.core.execution.ApplicationException: Deadlock found when trying to get lock; try restarting transaction 
	at com.netflix.conductor.dao.mysql.Query.executeUpdate(Query.java:276) 
	at com.netflix.conductor.dao.mysql.MySQLExecutionDAO.lambda$addWorkflowToTaskMapping$37(MySQLExecutionDAO.java:584) 
	at com.netflix.conductor.dao.mysql.MySQLBaseDAO.execute(MySQLBaseDAO.java:197) 
	at com.netflix.conductor.dao.mysql.MySQLExecutionDAO.addWorkflowToTaskMapping(MySQLExecutionDAO.java:583) 
	at com.netflix.conductor.dao.mysql.MySQLExecutionDAO.updateTask(MySQLExecutionDAO.java:523) 
	at com.netflix.conductor.dao.mysql.MySQLExecutionDAO.lambda$updateTask$2(MySQLExecutionDAO.java:137) 
	at com.netflix.conductor.dao.mysql.MySQLBaseDAO.lambda$withTransaction$3(MySQLBaseDAO.java:153) 
	at com.netflix.conductor.dao.mysql.MySQLBaseDAO.getWithTransaction(MySQLBaseDAO.java:98) 
	... 54 more

I think we wrap ES operations but not MySQL for some reason, the Exception above was recorded on v2.3.15. A specific case of Deadlock before https://github.com/Netflix/conductor/issues/576 (where we prefer not to synchronize), but I think it will happen occasionally for mysql persistence here and there, it’s best transactions are wrapped in retries.

“Always be prepared to re-issue a transaction if it fails due to deadlock. Deadlocks are not dangerous. Just try again.” - per https://dev.mysql.com/doc/refman/5.7/en/innodb-deadlocks-handling.html

should wrap MySQL withTransaction() like we did ElasticSearch ?

Sample deadlock captured:

2019-08-19 06:27:20 0x7f979bce7700
*** (1) TRANSACTION:
TRANSACTION 55204, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 22 lock struct(s), heap size 3520, 20 row lock(s), undo log entries 9
MySQL thread id 476, OS thread handle 140289201673984, query id 448134 172.17.0.1 conductor update
INSERT IGNORE INTO workflow_to_task (workflow_id, task_id) VALUES ('9c3e5781-0a7c-41e0-aced-422d0bcf9f59', '4d275114-b7c2-4740-9cdb-31a76758d645')
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 38 page no 9748 n bits 168 index PRIMARY of table `conductor`.`workflow_to_task` trx id 55204 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

*** (2) TRANSACTION:
TRANSACTION 55081, ACTIVE 1 sec inserting
mysql tables in use 1, locked 1
31 lock struct(s), heap size 8400, 27 row lock(s), undo log entries 14
MySQL thread id 460, OS thread handle 140289130788608, query id 448141 172.17.0.1 conductor update
INSERT INTO task (task_id, json_data, modified_on) VALUES ('d02c997e-4e55-46a6-baf7-eed265211304', '{"taskType":"someTask","status":"SCHEDULED","inputData":{"media_metadata":{"segments":[{"segType":1,"title":"First Segment","startOfMessageHours":0,"startOfMessageMinutes":1,"startOfMessageSeconds":0,"startOfMessageFrames":0,"endOfMessageHours":1,"endOfMessageMinutes":21,"endOfMessageSeconds":34,"endOfMessageFrames":0},{"segType":12,"title":null,"startOfMessageHours":0,"startOfMessageMinutes":1,"startOfMessageSeconds":30,"startOfMessageFrames":0,"endOfMessageHours":1,"endOfMessageMinutes":20,"endOfMessageSeconds":34,"endOfMessageFrames":0}],"identifier":"P262391","title":"Material Title"}},"referenceTaskName":"P262391","retryCount":0,"seq":3,"pollCount":0,"taskDefName":"someTask","scheduledTime":1566196040354,"startTime":0,"endTime":0,"updateTime":0,"startDelayInS
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 38 page no 9748 n bits 168 index PRIMARY of table `conductor`.`workflow_to_task` trx id 55081 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 32 page no 16046 n bits 80 index PRIMARY of table `conductor`.`task` trx id 55081 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

*** WE ROLL BACK TRANSACTION (1)

Issue Analytics

State:
Created 4 years ago
Reactions:3
Comments:5 (4 by maintainers)

Top GitHub Comments

1reaction

kishorebanalacommented, Aug 28, 2019

@s50600822 Here are the guidelines for contributions: https://github.com/Netflix/conductor/blob/master/CONTRIBUTING.md, Thank you.

1reaction

kishorebanalacommented, Aug 20, 2019

@s50600822 Definitely makes sense to add retries here. Please feel free to submit a PR when you have a chance.

Top Results From Across the Web

14.7.5.3 How to Minimize and Handle Deadlocks

Always be prepared to re-issue a transaction if it fails due to deadlock. Deadlocks are not dangerous. Just try again. Keep transactions small...

Restarting transaction in MySQL after deadlock - Stack Overflow

Execute the first function/line code which initiates a new transaction and retries the entire execution path until commit . Because the database engine...

A beginner's guide to database deadlock - Vlad Mihalcea

In this article, we are going to see how a deadlock can occur in a relational database system, and how Oracle, SQL Server,...

Implement SQL Server Transaction Retry Logic for failed ...

Under heavy contention your transactions could be the victim of a deadlock and therefore be rolled back. In this tip I will show...

How I deal with mysql Innodb deadlock in my mul... - JBoss.org

I use perrequest runtimemanager to get kiesession. JBPMHelper.setupDataSource(); EntityManagerFactory emf = Persistence.