question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Scheduler heartbeat adds one second if it is is configured to be 5 or less

See original GitHub issue

Apache Airflow version: 1.10.12

Kubernetes version (if you are using kubernetes) (use kubectl version): N/A

Environment:

  • Cloud provider or hardware configuration: Amazon r5.4xlarge EC2 instance.
  • OS (e.g. from /etc/os-release): Amazon Linux 2
  • Kernel (e.g. uname -a): 5.4.58-37.125.amzn2int.x86_64 (via uname -r)
  • Install tools: N/A
  • Others: N/A

What happened:

I am experiencing some strange issue with scheduler heartbeat. If I configure it to 5 seconds, somehow the heartbeat metric is received every 6 seconds. I tried different values and what I noticed is that with values of 5 or less, the heartbeat is received one second later than expected. But with 6 or more, the heartbeat is received at the expected time (though for 2 and 6 I experienced an even stranger fluctuating behaviour). Below is a table of the values I tried and the frequencies I received:

scheduler_heartbeat_sec value frequency of scheduler_heartbeat metric
0 2
1 2
2 Fluctuates between 2 and 4
3 4
5 6
6 Fluctuates between 6 and 8
10 10
30 30

What you expected to happen:

I expect the airflow.scheduler_heartbeat metric to be received at the same frequency specified by the scheduler_heartbeat_sec configuration.

How to reproduce it:

  • Install Airflow locally via pip
  • Enable StatsD metrics via the airflow.cfg configurations:
# Statsd (https://github.com/etsy/statsd) integration settings
statsd_on = True
statsd_host = localhost
statsd_port = 8125
statsd_prefix = airflow
  • Run Airflow webserver and scheduler and enable some DAGs.
  • Execute the following Python code and watch the timestamp of the received metrics:
import socket
from datetime import datetime

UDP_IP = "127.0.0.1"
UDP_PORT = 8125

sock = socket.socket(socket.AF_INET, # Internet
                     socket.SOCK_DGRAM) # UDP
sock.bind((UDP_IP, UDP_PORT))

prev_iter_time = None
while True:
    data, addr = sock.recvfrom(1024) # buffer size is 1024 bytes
    message = str(data)
    if 'airflow.scheduler_heartbeat' in message:
        this_iter_time = datetime.now()
        diff = (this_iter_time - prev_iter_time).total_seconds() if prev_iter_time is not None else 0
        print(f"{this_iter_time} [{diff} seconds] received message: {message}.")
        prev_iter_time = this_iter_time

This is an example output when I set scheduler_heartbeat_sec to 5:

2020-12-11 22:18:34.747872 [0 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:18:40.760118 [6.012246 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:18:46.765760 [6.005642 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:18:52.771790 [6.00603 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:18:58.778018 [6.006228 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:19:04.784721 [6.006703 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:19:10.789910 [6.005189 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.

And this is another example when I set scheduler_heartbeat_sec to 6:

2020-12-11 22:19:56.306896 [0 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:20:02.317303 [6.010407 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:20:10.324169 [8.006866 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:20:18.331488 [8.007319 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:20:24.338565 [6.007077 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:20:30.345710 [6.007145 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:20:38.347814 [8.002104 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:20:44.358943 [6.011129 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:20:52.368052 [8.009109 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.

Notice that this time it fluctuates between 6 and 8 for some reason.

Now, setting scheduler_heartbeat_sec to 10, here is a much better stable output with the expected frequency:

python3 listen_to_statsd.py
2020-12-11 22:22:39.028670 [0 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:22:49.041901 [10.013231 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:22:59.052456 [10.010555 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:23:09.062201 [10.009745 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:23:19.071650 [10.009449 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:23:29.084219 [10.012569 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:23:39.093269 [10.00905 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:23:49.106148 [10.012879 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:23:59.113249 [10.007101 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:24:09.124079 [10.01083 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:24:19.135419 [10.01134 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:24:29.143558 [10.008139 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:24:39.154385 [10.010827 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:24:49.164710 [10.010325 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.

Anything else we need to know:

Yes, you are awesome, but you might already know this.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:10 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
ashbcommented, Dec 14, 2020

pip install 'apache-airflow[statsd]==2.0.0rc2' (there’s an rc2 out now, soon to be an rc3)

0reactions
rafidkacommented, Sep 8, 2022

I am not sure, I only checked on the versions mentioned above.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Scheduler heartbeat adds one second if it is is configured to be 5 or ...
I expect the airflow.scheduler_heartbeat metric to be received at the same frequency specified by the scheduler_heartbeat_sec configuration. How to reproduce it ...
Read more >
[GitHub] [airflow] rafidka commented on issue #13026: Scheduler ...
[GitHub] [airflow] rafidka commented on issue #13026: Scheduler heartbeat adds one second if it is is configured to be 5 or less ·...
Read more >
Configuration Reference — Airflow Documentation
Configuration Reference¶. This page contains the list of all the available Airflow configurations that you can set in airflow.cfg file or using environment ......
Read more >
Performance tuning for Apache Airflow on Amazon MWAA
This page describes the best practices we recommend to tune the performance of an Amazon Managed Workflows for Apache Airflow (MWAA) environment using ......
Read more >
Troubleshooting Airflow scheduler issues | Cloud Composer
Cloud Composer 1 | Cloud Composer 2. This page provides troubleshooting steps and information for common issues with Airflow schedulers.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found