Scheduler heartbeat adds one second if it is is configured to be 5 or less
See original GitHub issueApache Airflow version: 1.10.12
Kubernetes version (if you are using kubernetes) (use kubectl version
): N/A
Environment:
- Cloud provider or hardware configuration: Amazon
r5.4xlarge
EC2 instance. - OS (e.g. from /etc/os-release): Amazon Linux 2
- Kernel (e.g.
uname -a
): 5.4.58-37.125.amzn2int.x86_64 (viauname -r
) - Install tools: N/A
- Others: N/A
What happened:
I am experiencing some strange issue with scheduler heartbeat. If I configure it to 5 seconds, somehow the heartbeat metric is received every 6 seconds. I tried different values and what I noticed is that with values of 5 or less, the heartbeat is received one second later than expected. But with 6 or more, the heartbeat is received at the expected time (though for 2 and 6 I experienced an even stranger fluctuating behaviour). Below is a table of the values I tried and the frequencies I received:
scheduler_heartbeat_sec value | frequency of scheduler_heartbeat metric |
---|---|
0 | 2 |
1 | 2 |
2 | Fluctuates between 2 and 4 |
3 | 4 |
5 | 6 |
6 | Fluctuates between 6 and 8 |
10 | 10 |
30 | 30 |
What you expected to happen:
I expect the airflow.scheduler_heartbeat
metric to be received at the same frequency specified by the scheduler_heartbeat_sec
configuration.
How to reproduce it:
- Install Airflow locally via pip
- Enable StatsD metrics via the airflow.cfg configurations:
# Statsd (https://github.com/etsy/statsd) integration settings
statsd_on = True
statsd_host = localhost
statsd_port = 8125
statsd_prefix = airflow
- Run Airflow webserver and scheduler and enable some DAGs.
- Execute the following Python code and watch the timestamp of the received metrics:
import socket
from datetime import datetime
UDP_IP = "127.0.0.1"
UDP_PORT = 8125
sock = socket.socket(socket.AF_INET, # Internet
socket.SOCK_DGRAM) # UDP
sock.bind((UDP_IP, UDP_PORT))
prev_iter_time = None
while True:
data, addr = sock.recvfrom(1024) # buffer size is 1024 bytes
message = str(data)
if 'airflow.scheduler_heartbeat' in message:
this_iter_time = datetime.now()
diff = (this_iter_time - prev_iter_time).total_seconds() if prev_iter_time is not None else 0
print(f"{this_iter_time} [{diff} seconds] received message: {message}.")
prev_iter_time = this_iter_time
This is an example output when I set scheduler_heartbeat_sec
to 5:
2020-12-11 22:18:34.747872 [0 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:18:40.760118 [6.012246 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:18:46.765760 [6.005642 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:18:52.771790 [6.00603 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:18:58.778018 [6.006228 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:19:04.784721 [6.006703 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:19:10.789910 [6.005189 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
And this is another example when I set scheduler_heartbeat_sec
to 6:
2020-12-11 22:19:56.306896 [0 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:20:02.317303 [6.010407 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:20:10.324169 [8.006866 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:20:18.331488 [8.007319 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:20:24.338565 [6.007077 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:20:30.345710 [6.007145 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:20:38.347814 [8.002104 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:20:44.358943 [6.011129 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:20:52.368052 [8.009109 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
Notice that this time it fluctuates between 6 and 8 for some reason.
Now, setting scheduler_heartbeat_sec
to 10, here is a much better stable output with the expected frequency:
python3 listen_to_statsd.py
2020-12-11 22:22:39.028670 [0 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:22:49.041901 [10.013231 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:22:59.052456 [10.010555 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:23:09.062201 [10.009745 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:23:19.071650 [10.009449 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:23:29.084219 [10.012569 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:23:39.093269 [10.00905 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:23:49.106148 [10.012879 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:23:59.113249 [10.007101 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:24:09.124079 [10.01083 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:24:19.135419 [10.01134 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:24:29.143558 [10.008139 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:24:39.154385 [10.010827 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
2020-12-11 22:24:49.164710 [10.010325 seconds] received message: b'airflow.scheduler_heartbeat:1|c'.
Anything else we need to know:
Yes, you are awesome, but you might already know this.
Issue Analytics
- State:
- Created 3 years ago
- Comments:10 (5 by maintainers)
pip install 'apache-airflow[statsd]==2.0.0rc2'
(there’s an rc2 out now, soon to be an rc3)I am not sure, I only checked on the versions mentioned above.