question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Task exited with return code 1 without any warning/error message after reboot server and restart service

See original GitHub issue

Apache Airflow version: 1.10.10

Kubernetes version (if you are using kubernetes) (use kubectl version): Not using Kubernetes or docker

Environment: CentOS Linux release 7.7.1908 (Core) Linux 3.10.0-1062.el7.x86_64

Python Version: 3.7.6

Executor: LocalExecutor

What happened:

I write a simple dag to clean airflow logs. Everything is OK when I use ‘airflow test’ command to test it, I also trigger it manually in WebUI which use ‘airflow run’ command to start my task, it is still OK.

But after I reboot my server and restart my webserver & scheduler service (in daemon mode), every time I trigger the exactly same dag, it still get scheduled like usual, but exit with code 1 immediately after start a new process to run task.

I also use ‘airflow test’ command again to check if there is something wrong with my code now, but everything seems OK when using ‘airflow test’, but exit silently when using ‘airflow run’, it is really weird.

Here’s the task log when it’s manually triggered in WebUI ( I’ve changed the log level to DEBUG, but still can’t find anything useful), or you can read the attached log file: task error log.txt

Reading local file: /root/airflow/logs/airflow_log_cleanup/log_cleanup_worker_num_1/2020-04-29T13:51:44.071744+00:00/1.log [2020-04-29 21:51:53,744] {base_task_runner.py:61} DEBUG - Planning to run as the user [2020-04-29 21:51:53,750] {taskinstance.py:686} DEBUG - <TaskInstance: airflow_log_cleanup.log_cleanup_worker_num_1 2020-04-29T13:51:44.071744+00:00 [queued]> dependency ‘Previous Dagrun State’ PASSED: True, The task did not have depends_on_past set. [2020-04-29 21:51:53,754] {taskinstance.py:686} DEBUG - <TaskInstance: airflow_log_cleanup.log_cleanup_worker_num_1 2020-04-29T13:51:44.071744+00:00 [queued]> dependency ‘Not In Retry Period’ PASSED: True, The task instance was not marked for retrying. [2020-04-29 21:51:53,754] {taskinstance.py:686} DEBUG - <TaskInstance: airflow_log_cleanup.log_cleanup_worker_num_1 2020-04-29T13:51:44.071744+00:00 [queued]> dependency ‘Task Instance State’ PASSED: True, Task state queued was valid. [2020-04-29 21:51:53,754] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: airflow_log_cleanup.log_cleanup_worker_num_1 2020-04-29T13:51:44.071744+00:00 [queued]> [2020-04-29 21:51:53,757] {taskinstance.py:686} DEBUG - <TaskInstance: airflow_log_cleanup.log_cleanup_worker_num_1 2020-04-29T13:51:44.071744+00:00 [queued]> dependency ‘Previous Dagrun State’ PASSED: True, The task did not have depends_on_past set. [2020-04-29 21:51:53,760] {taskinstance.py:686} DEBUG - <TaskInstance: airflow_log_cleanup.log_cleanup_worker_num_1 2020-04-29T13:51:44.071744+00:00 [queued]> dependency ‘Pool Slots Available’ PASSED: True, (‘There are enough open slots in %s to execute the task’, ‘default_pool’) [2020-04-29 21:51:53,766] {taskinstance.py:686} DEBUG - <TaskInstance: airflow_log_cleanup.log_cleanup_worker_num_1 2020-04-29T13:51:44.071744+00:00 [queued]> dependency ‘Not In Retry Period’ PASSED: True, The task instance was not marked for retrying. [2020-04-29 21:51:53,768] {taskinstance.py:686} DEBUG - <TaskInstance: airflow_log_cleanup.log_cleanup_worker_num_1 2020-04-29T13:51:44.071744+00:00 [queued]> dependency ‘Task Concurrency’ PASSED: True, Task concurrency is not set. [2020-04-29 21:51:53,768] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: airflow_log_cleanup.log_cleanup_worker_num_1 2020-04-29T13:51:44.071744+00:00 [queued]> [2020-04-29 21:51:53,768] {taskinstance.py:879} INFO -

[2020-04-29 21:51:53,768] {taskinstance.py:880} INFO - Starting attempt 1 of 2 [2020-04-29 21:51:53,768] {taskinstance.py:881} INFO -

[2020-04-29 21:51:53,779] {taskinstance.py:900} INFO - Executing <Task(BashOperator): log_cleanup_worker_num_1> on 2020-04-29T13:51:44.071744+00:00 [2020-04-29 21:51:53,781] {standard_task_runner.py:53} INFO - Started process 29718 to run task [2020-04-29 21:51:53,805] {logging_mixin.py:112} INFO - [2020-04-29 21:51:53,805] {cli_action_loggers.py:68} DEBUG - Calling callbacks: [<function default_action_log at 0x7fc9a62513b0>] [2020-04-29 21:51:53,818] {logging_mixin.py:112} INFO - [2020-04-29 21:51:53,817] {cli_action_loggers.py:86} DEBUG - Calling callbacks: [] [2020-04-29 21:51:58,759] {logging_mixin.py:112} INFO - [2020-04-29 21:51:58,759] {base_job.py:200} DEBUG - [heartbeat] [2020-04-29 21:51:58,759] {logging_mixin.py:112} INFO - [2020-04-29 21:51:58,759] {local_task_job.py:124} DEBUG - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.98824 s [2020-04-29 21:52:03,753] {logging_mixin.py:112} INFO - [2020-04-29 21:52:03,753] {local_task_job.py:103} INFO - Task exited with return code 1

How to reproduce it:

I really don’t know how to reproduce it. because it happens suddenly, and seems like permanently??

Anything else we need to know:

I try to figure out the difference between ‘airflow test’ and ‘airflow run’, it might have something to do with process fork I guess?

What I’ve tried to solve this problem but all failed:

  • clear all dag/dag run/task instance info, remove all files under /root/airflow except for the config file, and restart my service

  • reboot my server again

  • uninstall airflow and install it again

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

9reactions
zsmeijincommented, May 9, 2020

I finally figure out how to reproduce this bug.

When you config email in airflow.cfg and your dag contains email operator or use smtp serivce, if your smtp password contains character like “^”, the first task of your dag will 100% exited with return code 1 without any error information, in my case the first task is merely a python operator.

Although I think it’s my bad to mess up smtp service, there should be some reasonable hints, actually it takes me a whole week to debug this, I have to reset everything in my airflow environment and slowly change configuration to see when does this bug happens.

Hope this information is helpful

0reactions
zkancommented, Jul 26, 2021
Read more comments on GitHub >

github_iconTop Results From Across the Web

Airflow task exited with return code 1 without any warning ...
When you config email in airflow.cfg and your dag contains email operator or use smtp serivce, if your smtp password contains character like...
Read more >
How to Fix 'Terminated With Exit Code 1' Error - Komodor
Exit Code 1 means that a container terminated, typically due to an application error or an invalid reference. An application error is a...
Read more >
Services do not automatically restart on a computer that uses ...
To fix this problem, use one of the following methods: Set the Startup type of the service to Automatic (Delayed Start). Create a...
Read more >
Service Messages - TeamCity On-Premises - JetBrains
This message fails the build in case its status is ERROR and the "Fail build ... script to report test runs to the...
Read more >
Process | Node.js v19.3.0 Documentation
The Node.js process will exit immediately after calling the 'exit' event ... The event should not be used as an equivalent to On...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found