Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Scheduler Dying/Hanging v2.0.0: WARNING - Killing DAGFileProcessorProcess

See original GitHub issue

Apache Airflow version: 2.0.0

Environment: apache/airflow:2.0.0 docker image, Docker Desktop for Mac 3.0.4

What happened: The scheduler runs fine for a bit, then after a few minutes it starts spitting the following out every second (and the container appears to be stuck as it needs to be force killed):

scheduler_1  | [2021-01-12 01:45:48,149] {scheduler_job.py:262} WARNING - Killing DAGFileProcessorProcess (PID=1112)
scheduler_1  | [2021-01-12 01:45:49,153] {scheduler_job.py:262} WARNING - Killing DAGFileProcessorProcess (PID=1112)
scheduler_1  | [2021-01-12 01:45:49,159] {scheduler_job.py:262} WARNING - Killing DAGFileProcessorProcess (PID=1112)
scheduler_1  | [2021-01-12 01:45:50,163] {scheduler_job.py:262} WARNING - Killing DAGFileProcessorProcess (PID=1112)
scheduler_1  | [2021-01-12 01:45:50,165] {scheduler_job.py:262} WARNING - Killing DAGFileProcessorProcess (PID=1112)
scheduler_1  | [2021-01-12 01:45:51,169] {scheduler_job.py:262} WARNING - Killing DAGFileProcessorProcess (PID=1112)
scheduler_1  | [2021-01-12 01:45:51,172] {scheduler_job.py:262} WARNING - Killing DAGFileProcessorProcess (PID=1112)

Note that there is only a single dag enabled with a single task as I’m just trying to get this off the ground. That dag is scheduled to run daily so it’s almost never running aside from when I’m manually testing it.

What you expected to happen: The scheduler to stay idle without issues.

How to reproduce it: Unclear as it seems to happen almost randomly after a few minutes. Below is the scheduler section of my airflow.cfg. The scheduler is using LocalExecutor. I have the scheduler and webserver running in separate containers, which may or may not be related? Let me know what other information might be helpful.

[scheduler]
# Task instances listen for external kill signal (when you clear tasks
# from the CLI or the UI), this defines the frequency at which they should
# listen (in seconds).
job_heartbeat_sec = 10

# How often (in seconds) to check and tidy up 'running' TaskInstancess
# that no longer have a matching DagRun
clean_tis_without_dagrun_interval = 15.0

# The scheduler constantly tries to trigger new tasks (look at the
# scheduler section in the docs for more information). This defines
# how often the scheduler should run (in seconds).
scheduler_heartbeat_sec = 10

# The number of times to try to schedule each DAG file
# -1 indicates unlimited number
num_runs = -1

# The number of seconds to wait between consecutive DAG file processing
processor_poll_interval = 10

# after how much time (seconds) a new DAGs should be picked up from the filesystem
min_file_process_interval = 30

# How often (in seconds) to scan the DAGs directory for new files. Default to 5 minutes.
dag_dir_list_interval = 300

# How often should stats be printed to the logs. Setting to 0 will disable printing stats
print_stats_interval = 30

# How often (in seconds) should pool usage stats be sent to statsd (if statsd_on is enabled)
pool_metrics_interval = 5.0

# If the last scheduler heartbeat happened more than scheduler_health_check_threshold
# ago (in seconds), scheduler is considered unhealthy.
# This is used by the health check in the "/health" endpoint
scheduler_health_check_threshold = 30

# How often (in seconds) should the scheduler check for orphaned tasks and SchedulerJobs
orphaned_tasks_check_interval = 300.0
child_process_log_directory = /opt/airflow/logs/scheduler

# Local task jobs periodically heartbeat to the DB. If the job has
# not heartbeat in this many seconds, the scheduler will mark the
# associated task instance as failed and will re-schedule the task.
scheduler_zombie_task_threshold = 300

# Turn off scheduler catchup by setting this to ``False``.
# Default behavior is unchanged and
# Command Line Backfills still work, but the scheduler
# will not do scheduler catchup if this is ``False``,
# however it can be set on a per DAG basis in the
# DAG definition (catchup)
catchup_by_default = False

# This changes the batch size of queries in the scheduling main loop.
# If this is too high, SQL query performance may be impacted by one
# or more of the following:
# - reversion to full table scan
# - complexity of query predicate
# - excessive locking
# Additionally, you may hit the maximum allowable query length for your db.
# Set this to 0 for no limit (not advised)
max_tis_per_query = 512

# Should the scheduler issue ``SELECT ... FOR UPDATE`` in relevant queries.
# If this is set to False then you should not run more than a single
# scheduler at once
use_row_level_locking = True

# Max number of DAGs to create DagRuns for per scheduler loop
#
# Default: 10
# max_dagruns_to_create_per_loop =

# How many DagRuns should a scheduler examine (and lock) when scheduling
# and queuing tasks.
#
# Default: 20
# max_dagruns_per_loop_to_schedule =

# Should the Task supervisor process perform a "mini scheduler" to attempt to schedule more tasks of the
# same DAG. Leaving this on will mean tasks in the same DAG execute quicker, but might starve out other
# dags in some circumstances
#
# Default: True
# schedule_after_task_execution =

# The scheduler can run multiple processes in parallel to parse dags.
# This defines how many processes will run.
parsing_processes = 1

# Turn off scheduler use of cron intervals by setting this to False.
# DAGs submitted manually in the web UI or with trigger_dag will still run.
use_job_schedule = True

# Allow externally triggered DagRuns for Execution Dates in the future
# Only has effect if schedule_interval is set to None in DAG
allow_trigger_in_future = False

max_threads = 1

Issue Analytics

State:
Created 3 years ago
Comments:9 (3 by maintainers)

Top GitHub Comments

6reactions

steveipkiscommented, Jan 21, 2022

I’m also facing the exact same issue: Airflow 2.1.4

Number of Schedulers: 1

Any luck on resolving this?

1reaction

imamdigmicommented, Nov 24, 2021

Not yet @DuyHV20150601 , I still got the same error. Maybe this https://github.com/apache/airflow/issues/17507#issuecomment-973177410 can solve the issue, but I don’t tested it yet

Top Results From Across the Web

[GitHub] [airflow] totalhack opened a new issue #13625

[GitHub] [airflow] totalhack opened a new issue #13625: Scheduler Dying/Hanging v2.0.0: WARNING - Killing DAGFileProcessorProcess.

Airflow scheduler stuck - Stack Overflow

After waiting for it to finish killing whatever processes he is killing, he starts executing all of the tasks properly. I don't even...

Tasks stop working after 5 minutes - Astronomer Forum

The scheduler is stuck and in the logs I permanently get the message “{scheduler_job.py:214} WARNING - Killing PID 5866”.

Configuration Reference — Airflow Documentation

New in version 2.0.0. Celery task will report its status as 'started' when the task is executed by a worker. This is used...

Branch Context - Codecov

63, from airflow.utils.event_scheduler import EventScheduler ... 264, self.log.warning("Killing DAGFileProcessorProcess (PID=%d)", self._process.pid).