Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Improvements to the "Executor reports task instance finished (failed) although the task says its queued" exception messages

See original GitHub issue

Description

One day our production Airflow production (1.10.9) environment started to raise the following exception for multiple tasks in different DAGs:

Try 0 out of 1

Exception:

Executor reports task instance finished (failed) although the task says its queued. Was the task killed externally?

If you let it retry the tasks, eventually it would end. However, it made lots of ETLs increase the execution time like 400%.

Later, thanks for an obscure Stackoverflow answer, we discovered that this was happening because the Airflow database was overloaded. Probably the Airflow Scheduler started to receive multiple timeouts and thought that healthy tasks were killed or something like. We changed it to use a better instance and the exceptions stopped without any other change.

Use case / motivation

Firstly, the start of the error message doesn’t seem to be natural Try 0 out of 1. The task didn’t even run, I believe that it should flag that, showing something like Try 0 out of 1 (failed to start the task).

The rest of the exception message is pretty useless. I am far from an Airflow expert but I believe that the message could be more specific and change accordingly to the exception context, some ideas which I had according to some problems which could happen:

Executor reports task instance finished (failed) although the task says its queued. Was the task killed externally?
Executor reports task instance finished (failed) although the task says its queued. Is the Scheduler task healthy?
Executor reports task instance finished (failed) although the task says its queued. Is the Airflow connection with the Database OK?

That is it, if possible, small changes like that would avoid a lot of problems for other people like I had these days.

Thanks!

Issue Analytics

State:
Created 3 years ago
Reactions:8
Comments:7 (3 by maintainers)

Top GitHub Comments

2reactions

kaxilcommented, Sep 14, 2021

@ephraimbuddy Here is another case where Celery can fail to even run the task

2reactions

wfariacommented, Sep 3, 2020

Hi @wfaria, could you please elaborate on what you mean by We changed it to a better instance? I’m curious, as I am seeing this issue and would like to know what you changed on the database. Any help would be very much appreciated.

Thank you!

Hello, we changed the RDS instance type from t3.micro to a t3.small. Check the CPU Credit balance on the Database Monitoring Tab to see if it is zeroed (so it is overloaded). We noticed that the database was the cause of that issue because of that.

Good luck!

Top Results From Across the Web

Apache Airflow: Executor reports task instance finished ...

I thought it was due to the ExternalTaskSensor and concurrency issues given the queuing and killed task language that looked like this: Executor...

[GitHub] [airflow] wfaria opened a new issue #9557: Improvements ...

... opened a new issue #9557: Improvements to the "Executor reports task instance finished (failed) although the task says its queued" exception messages....

Best Practices - Apache Airflow

It's primary purpose is to fail a DAG Run when any other task fail. The need came from the Airflow system tests that...

Troubleshooting: CloudWatch Logs and CloudTrail errors

If you see blank logs, or the follow error when viewing Task logs in the ... Executor reports task instance %s finished (%s)...

apache/incubator-airflow - Gitter

@here I upgraded to 1.10.6 and now I am getting "Executor reports task instance finished (success) although the task says its queued. Was...