Heartbeat does not detect zombie processes when using Local Agent
See original GitHub issueDescription
I have been testing different situations where a task may fail by external causes (i.e.: I used a kill --9 command to kill the task process). I discovered that using a Local Agent lead to never detect a Zombie Process, neither using Prefect Cloud or locally on my Laptop.
However, if I stop the Local Agent and restart it, then it detects the zombie process and works correctly, even rescheduled if using Prefect Cloud thanks to the Lazarus process.
To give more information, using the Docker Agent and kill the flow running docker kill <contained_id> it works correctly (after a few minutes it retries the flow again) and there is no need to restart the agent.
Expected Behavior
I expect that all the stuff done when restarting the Local Agent works correctly without that need.
Reproduction
Here I give you the flow definition that I used to test this:
import datetime
import time
import os
import prefect
from prefect import task, Flow
def append_result(result):
with open("/tmp/file.txt", "a") as f:
f.write(result)
f.write("\n")
@task
def delete_file():
try:
os.remove('/tmp/file.txt')
except:
pass
@task(max_retries=5, retry_delay=datetime.timedelta(seconds=2), timeout=60)
def generate_file_simple():
for i in range(10):
time.sleep(1)
append_result(f"{datetime.datetime.now()}: {i}. I am PID: {os.getpid()}")
with Flow("be-killed") as f:
t1 = delete_file()
t2 = generate_file_simple()
# set dependency
t2.set_upstream(t1)
# register flow in prefect cloud
with open('../prefect-cloud-user-token') as f:
user_api_token = f.read().strip()
client = prefect.Client(api_token=user_api_token)
client.login_to_tenant(tenant_slug='XXXXX')
flow_run_id = client.create_flow_run(flow_id=flow_id)
When I see that the agent is running the task, and I verify that the file is being written, I kill the process by running, where the PID is being written in each line in the file being written:
import os
os.kill(XXXX, 9)
Environment
{
"config_overrides": {},
"env_vars": [],
"system_information": {
"platform": "Darwin-19.4.0-x86_64-i386-64bit",
"prefect_version": "0.11.2",
"python_version": "3.7.7"
}
}
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (2 by maintainers)

Top Related StackOverflow Question
Oh I see! Thank you very much! I guess that as local agent submits flow runs to run in a subprocess, the heatbeat should check the subprocess status as well.
However I find the Docker (and Docker Agent) more reliable in production. Therefore, and being the issue totally explainable, I will use Prefect very confident 😃
@kevin868 There is a different issue #7239 for v2. There are no heartbeats in v2 at this time, but we would like to figure out a way to get this working.