Task exited with return code 1 without any warning/error message after reboot server and restart service
See original GitHub issueApache Airflow version: 1.10.10
Kubernetes version (if you are using kubernetes) (use kubectl version
): Not using Kubernetes or docker
Environment: CentOS Linux release 7.7.1908 (Core) Linux 3.10.0-1062.el7.x86_64
Python Version: 3.7.6
Executor: LocalExecutor
What happened:
I write a simple dag to clean airflow logs. Everything is OK when I use ‘airflow test’ command to test it, I also trigger it manually in WebUI which use ‘airflow run’ command to start my task, it is still OK.
But after I reboot my server and restart my webserver & scheduler service (in daemon mode), every time I trigger the exactly same dag, it still get scheduled like usual, but exit with code 1 immediately after start a new process to run task.
I also use ‘airflow test’ command again to check if there is something wrong with my code now, but everything seems OK when using ‘airflow test’, but exit silently when using ‘airflow run’, it is really weird.
Here’s the task log when it’s manually triggered in WebUI ( I’ve changed the log level to DEBUG, but still can’t find anything useful), or you can read the attached log file: task error log.txt
Reading local file: /root/airflow/logs/airflow_log_cleanup/log_cleanup_worker_num_1/2020-04-29T13:51:44.071744+00:00/1.log [2020-04-29 21:51:53,744] {base_task_runner.py:61} DEBUG - Planning to run as the user [2020-04-29 21:51:53,750] {taskinstance.py:686} DEBUG - <TaskInstance: airflow_log_cleanup.log_cleanup_worker_num_1 2020-04-29T13:51:44.071744+00:00 [queued]> dependency ‘Previous Dagrun State’ PASSED: True, The task did not have depends_on_past set. [2020-04-29 21:51:53,754] {taskinstance.py:686} DEBUG - <TaskInstance: airflow_log_cleanup.log_cleanup_worker_num_1 2020-04-29T13:51:44.071744+00:00 [queued]> dependency ‘Not In Retry Period’ PASSED: True, The task instance was not marked for retrying. [2020-04-29 21:51:53,754] {taskinstance.py:686} DEBUG - <TaskInstance: airflow_log_cleanup.log_cleanup_worker_num_1 2020-04-29T13:51:44.071744+00:00 [queued]> dependency ‘Task Instance State’ PASSED: True, Task state queued was valid. [2020-04-29 21:51:53,754] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: airflow_log_cleanup.log_cleanup_worker_num_1 2020-04-29T13:51:44.071744+00:00 [queued]> [2020-04-29 21:51:53,757] {taskinstance.py:686} DEBUG - <TaskInstance: airflow_log_cleanup.log_cleanup_worker_num_1 2020-04-29T13:51:44.071744+00:00 [queued]> dependency ‘Previous Dagrun State’ PASSED: True, The task did not have depends_on_past set. [2020-04-29 21:51:53,760] {taskinstance.py:686} DEBUG - <TaskInstance: airflow_log_cleanup.log_cleanup_worker_num_1 2020-04-29T13:51:44.071744+00:00 [queued]> dependency ‘Pool Slots Available’ PASSED: True, (‘There are enough open slots in %s to execute the task’, ‘default_pool’) [2020-04-29 21:51:53,766] {taskinstance.py:686} DEBUG - <TaskInstance: airflow_log_cleanup.log_cleanup_worker_num_1 2020-04-29T13:51:44.071744+00:00 [queued]> dependency ‘Not In Retry Period’ PASSED: True, The task instance was not marked for retrying. [2020-04-29 21:51:53,768] {taskinstance.py:686} DEBUG - <TaskInstance: airflow_log_cleanup.log_cleanup_worker_num_1 2020-04-29T13:51:44.071744+00:00 [queued]> dependency ‘Task Concurrency’ PASSED: True, Task concurrency is not set. [2020-04-29 21:51:53,768] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: airflow_log_cleanup.log_cleanup_worker_num_1 2020-04-29T13:51:44.071744+00:00 [queued]> [2020-04-29 21:51:53,768] {taskinstance.py:879} INFO -
[2020-04-29 21:51:53,768] {taskinstance.py:880} INFO - Starting attempt 1 of 2 [2020-04-29 21:51:53,768] {taskinstance.py:881} INFO -
[2020-04-29 21:51:53,779] {taskinstance.py:900} INFO - Executing <Task(BashOperator): log_cleanup_worker_num_1> on 2020-04-29T13:51:44.071744+00:00 [2020-04-29 21:51:53,781] {standard_task_runner.py:53} INFO - Started process 29718 to run task [2020-04-29 21:51:53,805] {logging_mixin.py:112} INFO - [2020-04-29 21:51:53,805] {cli_action_loggers.py:68} DEBUG - Calling callbacks: [<function default_action_log at 0x7fc9a62513b0>] [2020-04-29 21:51:53,818] {logging_mixin.py:112} INFO - [2020-04-29 21:51:53,817] {cli_action_loggers.py:86} DEBUG - Calling callbacks: [] [2020-04-29 21:51:58,759] {logging_mixin.py:112} INFO - [2020-04-29 21:51:58,759] {base_job.py:200} DEBUG - [heartbeat] [2020-04-29 21:51:58,759] {logging_mixin.py:112} INFO - [2020-04-29 21:51:58,759] {local_task_job.py:124} DEBUG - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.98824 s [2020-04-29 21:52:03,753] {logging_mixin.py:112} INFO - [2020-04-29 21:52:03,753] {local_task_job.py:103} INFO - Task exited with return code 1
How to reproduce it:
I really don’t know how to reproduce it. because it happens suddenly, and seems like permanently??
Anything else we need to know:
I try to figure out the difference between ‘airflow test’ and ‘airflow run’, it might have something to do with process fork I guess?
What I’ve tried to solve this problem but all failed:
-
clear all dag/dag run/task instance info, remove all files under /root/airflow except for the config file, and restart my service
-
reboot my server again
-
uninstall airflow and install it again
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (3 by maintainers)
Top GitHub Comments
I finally figure out how to reproduce this bug.
When you config email in airflow.cfg and your dag contains email operator or use smtp serivce, if your smtp password contains character like “^”, the first task of your dag will 100% exited with return code 1 without any error information, in my case the first task is merely a python operator.
Although I think it’s my bad to mess up smtp service, there should be some reasonable hints, actually it takes me a whole week to debug this, I have to reset everything in my airflow environment and slowly change configuration to see when does this bug happens.
Hope this information is helpful
This issue seems related to https://github.com/apache/airflow/issues/15133.