Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] - apscheduler skipping alerts

See original GitHub issue

Firstly, thanks for maintaining the project.

Elastalert version - latest Python version - Python 3.8.5 OS - Ubuntu 20.04.1 LTS

Problem description. - This problem comes from the original elastalert. We noticed that amount of rules actually being run by Elastalert was different every time it ran - this was viewed in the Elastalert Elasticsearch index. We never had this issue with a “small” amount of rules and only noticed it when a large set of rules was loaded.

In the Elastalert logs you would see this intermittently:

May 24 09:27:06 minimdr python[2498368]: WARNING:apscheduler.executors.default:Run time of job "ElastAlerter.handle_rule_execution (trigger: interval[0:08:00], next run at: 2021-05-24 09:35:03 UTC)" was missed by 0:00:02.944895
May 24 09:27:06 minimdr python[2498368]: WARNING:apscheduler.executors.default:Run time of job "ElastAlerter.handle_rule_execution (trigger: interval[0:08:00], next run at: 2021-05-24 09:35:05 UTC)" was missed by 0:00:02.912215
May 24 09:27:06 minimdr python[2498368]: WARNING:apscheduler.executors.default:Run time of job "ElastAlerter.handle_rule_execution (trigger: interval[0:08:00], next run at: 2021-05-24 09:35:06 UTC)" was missed by 0:00:02.827846
May 24 09:27:06 minimdr python[2498368]: WARNING:apscheduler.executors.default:Run time of job "ElastAlerter.handle_rule_execution (trigger: interval[0:08:00], next run at: 2021-05-24 09:35:05 UTC)" was missed by 0:00:02.758194
May 24 09:27:06 minimdr python[2498368]: WARNING:apscheduler.executors.default:Run time of job "ElastAlerter.handle_rule_execution (trigger: interval[0:08:00], next run at: 2021-05-24 09:35:07 UTC)" was missed by 0:00:02.758226
May 24 09:27:06 minimdr python[2498368]: WARNING:apscheduler.executors.default:Run time of job "ElastAlerter.handle_rule_execution (trigger: interval[0:08:00], next run at: 2021-05-24 09:35:08 UTC)" was missed by 0:00:02.617983
May 24 09:27:06 minimdr python[2498368]: WARNING:apscheduler.executors.default:Run time of job "ElastAlerter.handle_rule_execution (trigger: interval[0:08:00], next run at: 2021-05-24 09:35:06 UTC)" was missed by 0:00:02.407513
May 24 09:27:06 minimdr python[2498368]: WARNING:apscheduler.executors.default:Run time of job "ElastAlerter.handle_rule_execution (trigger: interval[0:08:00], next run at: 2021-05-24 09:35:07 UTC)" was missed by 0:00:02.351592
May 24 09:27:06 minimdr python[2498368]: WARNING:apscheduler.executors.default:Run time of job "ElastAlerter.handle_rule_execution (trigger: interval[0:08:00], next run at: 2021-05-24 09:35:05 UTC)" was missed by 0:00:02.262315
May 24 09:27:06 minimdr python[2498368]: WARNING:apscheduler.executors.default:Run time of job "ElastAlerter.handle_rule_execution (trigger: interval[0:08:00], next run at: 2021-05-24 09:35:05 UTC)" was missed by 0:00:02.244299
May 24 09:27:06 minimdr python[2498368]: WARNING:apscheduler.executors.default:Run time of job "ElastAlerter.handle_rule_execution (trigger: interval[0:08:00], next run at: 2021-05-24 09:35:04 UTC)" was missed by 0:00:02.242278
May 24 09:27:06 minimdr python[2498368]: WARNING:apscheduler.executors.default:Run time of job "ElastAlerter.handle_rule_execution (trigger: interval[0:08:00], next run at: 2021-05-24 09:35:04 UTC)" was missed by 0:00:02.237550

We modified elastalert.py and added misfire_grace_time to job as a hack to ensure all the rules runs. The parameter was found here : https://apscheduler.readthedocs.io/en/stable/modules/job.html

This is the result of change:

Issue Analytics

State:
Created 2 years ago
Comments:13 (10 by maintainers)

Top GitHub Comments

3reactions

jertelcommented, May 25, 2021

Yes, I think both options would be useful:

Customize number of scheduler task threads
Customize misfire grace time (seconds)

1reaction

markus-nclosecommented, May 25, 2021

Yeah, like I mentioned initially, 95% of our queries is very basic. Unfortunately we have a few (~15 alerts) that run regex and take 20-40s which skews the numbers, all the rest runs for about 0.1-0.6. I will quickly do a test with an Elastic instance with no data to ensure searching time is not the issue.