question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Improvements to the "Executor reports task instance finished (failed) although the task says its queued" exception messages

See original GitHub issue

Description

One day our production Airflow production (1.10.9) environment started to raise the following exception for multiple tasks in different DAGs:

Try 0 out of 1

Exception:

Executor reports task instance finished (failed) although the task says its queued. Was the task killed externally?

If you let it retry the tasks, eventually it would end. However, it made lots of ETLs increase the execution time like 400%.

Later, thanks for an obscure Stackoverflow answer, we discovered that this was happening because the Airflow database was overloaded. Probably the Airflow Scheduler started to receive multiple timeouts and thought that healthy tasks were killed or something like. We changed it to use a better instance and the exceptions stopped without any other change.

Use case / motivation

Firstly, the start of the error message doesn’t seem to be natural Try 0 out of 1. The task didn’t even run, I believe that it should flag that, showing something like Try 0 out of 1 (failed to start the task).

The rest of the exception message is pretty useless. I am far from an Airflow expert but I believe that the message could be more specific and change accordingly to the exception context, some ideas which I had according to some problems which could happen:

  • Executor reports task instance finished (failed) although the task says its queued. Was the task killed externally?
  • Executor reports task instance finished (failed) although the task says its queued. Is the Scheduler task healthy?
  • Executor reports task instance finished (failed) although the task says its queued. Is the Airflow connection with the Database OK?

That is it, if possible, small changes like that would avoid a lot of problems for other people like I had these days.

Thanks!

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:8
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
kaxilcommented, Sep 14, 2021

@ephraimbuddy Here is another case where Celery can fail to even run the task

2reactions
wfariacommented, Sep 3, 2020

Hi @wfaria, could you please elaborate on what you mean by We changed it to a better instance? I’m curious, as I am seeing this issue and would like to know what you changed on the database. Any help would be very much appreciated.

Thank you!

Hello, we changed the RDS instance type from t3.micro to a t3.small. Check the CPU Credit balance on the Database Monitoring Tab to see if it is zeroed (so it is overloaded). We noticed that the database was the cause of that issue because of that.

Good luck!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Apache Airflow: Executor reports task instance finished ...
I thought it was due to the ExternalTaskSensor and concurrency issues given the queuing and killed task language that looked like this: Executor...
Read more >
[GitHub] [airflow] wfaria opened a new issue #9557: Improvements ...
... opened a new issue #9557: Improvements to the "Executor reports task instance finished (failed) although the task says its queued" exception messages....
Read more >
Best Practices - Apache Airflow
It's primary purpose is to fail a DAG Run when any other task fail. The need came from the Airflow system tests that...
Read more >
Troubleshooting: CloudWatch Logs and CloudTrail errors
If you see blank logs, or the follow error when viewing Task logs in the ... Executor reports task instance %s finished (%s)...
Read more >
apache/incubator-airflow - Gitter
@here I upgraded to 1.10.6 and now I am getting "Executor reports task instance finished (success) although the task says its queued. Was...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found