Timer job never unlocked after an exception
See original GitHub issueI have a simple process, very fast to execute, with this timer start event running each minute:
<startEvent id="StartEvent_17bxin3" name="Each minute"> <outgoing>SequenceFlow_17asw5l</outgoing> <timerEventDefinition> <timeCycle><![CDATA["0 0/1 * 1/1 * ? *"]]></timeCycle> </timerEventDefinition> </startEvent>
Then, I run that bpmn, everything goes fine but sometimes I can see that this timer doesn’t work anymore, until I restart the engine (I clean the flowable DBs at startup). It’s not the related to the retries getting to 0, because I use a value very very high.
To reproduce, I go in AcquireTimerJobsRunnable in the run() method and after I have one value in the acquiredJobs array, I just need to make the code throw an exception before the job is effectively launched, to make the job in the acquiredJobs array to not work anymore. So I breakpoint on the commandExecutor.execute method. At this point I can see in the database, for the collection ACT_RU_TIMER_JOB, that there’s an entry with the field LOCK_EXP_TIME_ that is not null; it’s a date a couple of minutes in the future. LOCK_OWNER_ is also not null and REV_ is 2. Then in Eclipse, I generate an exception by stepping into the code until I see a logger where the code do “config.getTransactionPropagation()” and config is a local variable, so I just need to set its value to null to cause a NullPointerException. After that, if I let that run, the entry in ACT_RU_TIMER_JOB will never disappears and the timer will not start again. Its field values will not change. Nothing appears in ACT_RU_DEADLETTER_JOB nor ACT_RU_JOB for this timer. Other timers continue to work normally.
Here I simulated an Exception with a NullPointerException, but on production I saw a MySQL Timeout in the logs.
I’ve found that there’s a flowable-reset-expired-jobs thread, and debugging it I’ve found that it search for expired jobs, correctly I think, but in the ACT_RU_JOB collection, not the ACT_RU_TIMER_JOB collection. I didn’t found a class in the package org.flowable.job.service.impl.asyncexecutor that run a job to watch for expired TIMER_JOB.
I think something is missing, help would be appreciated. Thank you.
I’m using Flowable 6.4.1, but I saw the same behavior with Flowable 6.5.0. I use it with MySQL and Spring Boot.
Issue Analytics
- State:
- Created 3 years ago
- Comments:14 (7 by maintainers)
Top GitHub Comments
As I said at least 20. The Flowable REST Application has it set to 100. It is not linked to the number of running bpmn, nor the running instances. It is a mix of things and it is different for everyone. It is dependent on how you use it.
You are right. Thanks for spotting it. We’ve added https://github.com/flowable/flowable-engine/commit/8d7c0fff6a0be060958fc20dd43074d004c5002c to solve the potential lock. We also expanded the reset expired runnable so that timer jobs are reseted as well.