question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MySQL Persistence should retry on Deadlock

See original GitHub issue

Should mysql operation in withTransaction be wrapped inside RetryUtil.retryOnException so that issue like can be retried on the spot rather than bubble up all the way to WorkflowExecutor:

	at org.eclipse.jetty.server.Server.handle(Server.java:524) 
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:319) 
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:253) 
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273) 
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) 
	at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) 
	at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) 
	at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) 
	at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) 
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) 
	at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) 
	at java.lang.Thread.run(Thread.java:748) 
Caused by: com.netflix.conductor.core.execution.ApplicationException: BACKEND_ERROR - Deadlock found when trying to get lock; try restarting transaction 
	at com.netflix.conductor.dao.mysql.MySQLBaseDAO.getWithTransaction(MySQLBaseDAO.java:103) 
	at com.netflix.conductor.dao.mysql.MySQLBaseDAO.withTransaction(MySQLBaseDAO.java:152) 
	at com.netflix.conductor.dao.mysql.MySQLExecutionDAO.updateTask(MySQLExecutionDAO.java:137) 
	at com.netflix.conductor.core.orchestration.ExecutionDAOFacade.updateTask(ExecutionDAOFacade.java:250) 
	... 51 more 
Caused by: com.netflix.conductor.core.execution.ApplicationException: Deadlock found when trying to get lock; try restarting transaction 
	at com.netflix.conductor.dao.mysql.Query.executeUpdate(Query.java:276) 
	at com.netflix.conductor.dao.mysql.MySQLExecutionDAO.lambda$addWorkflowToTaskMapping$37(MySQLExecutionDAO.java:584) 
	at com.netflix.conductor.dao.mysql.MySQLBaseDAO.execute(MySQLBaseDAO.java:197) 
	at com.netflix.conductor.dao.mysql.MySQLExecutionDAO.addWorkflowToTaskMapping(MySQLExecutionDAO.java:583) 
	at com.netflix.conductor.dao.mysql.MySQLExecutionDAO.updateTask(MySQLExecutionDAO.java:523) 
	at com.netflix.conductor.dao.mysql.MySQLExecutionDAO.lambda$updateTask$2(MySQLExecutionDAO.java:137) 
	at com.netflix.conductor.dao.mysql.MySQLBaseDAO.lambda$withTransaction$3(MySQLBaseDAO.java:153) 
	at com.netflix.conductor.dao.mysql.MySQLBaseDAO.getWithTransaction(MySQLBaseDAO.java:98) 
	... 54 more 

I think we wrap ES operations but not MySQL for some reason, the Exception above was recorded on v2.3.15. A specific case of Deadlock before https://github.com/Netflix/conductor/issues/576 (where we prefer not to synchronize), but I think it will happen occasionally for mysql persistence here and there, it’s best transactions are wrapped in retries.

“Always be prepared to re-issue a transaction if it fails due to deadlock. Deadlocks are not dangerous. Just try again.” - per https://dev.mysql.com/doc/refman/5.7/en/innodb-deadlocks-handling.html

Screen Shot 2019-08-19 at 11 08 12 am should wrap MySQL withTransaction() like we did ElasticSearch ?

Sample deadlock captured:

2019-08-19 06:27:20 0x7f979bce7700
*** (1) TRANSACTION:
TRANSACTION 55204, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 22 lock struct(s), heap size 3520, 20 row lock(s), undo log entries 9
MySQL thread id 476, OS thread handle 140289201673984, query id 448134 172.17.0.1 conductor update
INSERT IGNORE INTO workflow_to_task (workflow_id, task_id) VALUES ('9c3e5781-0a7c-41e0-aced-422d0bcf9f59', '4d275114-b7c2-4740-9cdb-31a76758d645')
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 38 page no 9748 n bits 168 index PRIMARY of table `conductor`.`workflow_to_task` trx id 55204 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

*** (2) TRANSACTION:
TRANSACTION 55081, ACTIVE 1 sec inserting
mysql tables in use 1, locked 1
31 lock struct(s), heap size 8400, 27 row lock(s), undo log entries 14
MySQL thread id 460, OS thread handle 140289130788608, query id 448141 172.17.0.1 conductor update
INSERT INTO task (task_id, json_data, modified_on) VALUES ('d02c997e-4e55-46a6-baf7-eed265211304', '{"taskType":"someTask","status":"SCHEDULED","inputData":{"media_metadata":{"segments":[{"segType":1,"title":"First Segment","startOfMessageHours":0,"startOfMessageMinutes":1,"startOfMessageSeconds":0,"startOfMessageFrames":0,"endOfMessageHours":1,"endOfMessageMinutes":21,"endOfMessageSeconds":34,"endOfMessageFrames":0},{"segType":12,"title":null,"startOfMessageHours":0,"startOfMessageMinutes":1,"startOfMessageSeconds":30,"startOfMessageFrames":0,"endOfMessageHours":1,"endOfMessageMinutes":20,"endOfMessageSeconds":34,"endOfMessageFrames":0}],"identifier":"P262391","title":"Material Title"}},"referenceTaskName":"P262391","retryCount":0,"seq":3,"pollCount":0,"taskDefName":"someTask","scheduledTime":1566196040354,"startTime":0,"endTime":0,"updateTime":0,"startDelayInS
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 38 page no 9748 n bits 168 index PRIMARY of table `conductor`.`workflow_to_task` trx id 55081 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 32 page no 16046 n bits 80 index PRIMARY of table `conductor`.`task` trx id 55081 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

*** WE ROLL BACK TRANSACTION (1)

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:3
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
kishorebanalacommented, Aug 28, 2019

@s50600822 Here are the guidelines for contributions: https://github.com/Netflix/conductor/blob/master/CONTRIBUTING.md, Thank you.

1reaction
kishorebanalacommented, Aug 20, 2019

@s50600822 Definitely makes sense to add retries here. Please feel free to submit a PR when you have a chance.

Read more comments on GitHub >

github_iconTop Results From Across the Web

14.7.5.3 How to Minimize and Handle Deadlocks
Always be prepared to re-issue a transaction if it fails due to deadlock. Deadlocks are not dangerous. Just try again. Keep transactions small...
Read more >
Restarting transaction in MySQL after deadlock - Stack Overflow
Execute the first function/line code which initiates a new transaction and retries the entire execution path until commit . Because the database engine...
Read more >
A beginner's guide to database deadlock - Vlad Mihalcea
In this article, we are going to see how a deadlock can occur in a relational database system, and how Oracle, SQL Server,...
Read more >
Implement SQL Server Transaction Retry Logic for failed ...
Under heavy contention your transactions could be the victim of a deadlock and therefore be rolled back. In this tip I will show...
Read more >
How I deal with mysql Innodb deadlock in my mul... - JBoss.org
I use perrequest runtimemanager to get kiesession. JBPMHelper.setupDataSource(); EntityManagerFactory emf = Persistence.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found