question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Confusing log for long running tasks: "dependency 'Task Instance Not Running' FAILED: Task is in the running state"

See original GitHub issue

Apache Airflow version: 1.10.* / 2.0.* / 2.1.*

Kubernetes version (if you are using kubernetes) (use kubectl version): Any

Environment:

  • Cloud provider or hardware configuration: Any
  • OS (e.g. from /etc/os-release): Any
  • Kernel (e.g. uname -a): Any
  • Install tools: Any
  • Others: N/A

What happened:

This line in the TaskInstance log is very misleading. It seems to happen for tasks that take longer than one hour. When users are waiting for tasks to finish and see this in the log, they often get confused. They may think something is wrong with their task or with Airflow. In fact, this line is harmless. It’s simply saying “the TaskInstance is already running so it cannot be run again”.

{taskinstance.py:874} INFO - Dependencies not met for <TaskInstance: ... [running]>, dependency 'Task Instance Not Running' FAILED: Task is in the running state
{taskinstance.py:874} INFO - Dependencies not met for <TaskInstance: ... [running]>, dependency 'Task Instance State' FAILED: Task is in the 'running' state which is not a valid state for execution. The task must be cleared in order to be run.

What you expected to happen:

The confusion is unnecessary. This line should be silenced in the log. Or it should log something clearer.

How to reproduce it:

Any task that takes more than an hour to run has this line in the log.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:12
  • Comments:18 (13 by maintainers)

github_iconTop GitHub Comments

3reactions
yuqian90commented, Jun 27, 2021

After some more investigation, it’s very likely we see this log appearing an hour after a long running task started because of the default visibility_timeout setting in Celery. This code in default_celery.py sets visibility_timeout to 21600 only if the broker_url starts with redis or sqs. In our case we are using redis sentinels so it’s still redis although the URL starts with sentinel. Therefore the visibility_timeout is left at 3600 which is the default according to celery documentation. The weird thing is that after I tried to manually change visibility_timeout to a very large integer in airflow.cfg, the same log still showed up exactly an hour after a task started. So it seems changing visibility_timeout in this case does not make any difference. Not sure if anyone experienced the same.

@david30907d maybe try changing visibility_timeout to a large number in your setup and see if it still happens after an hour. If it stops for you, it means visibility_timeout is probably the cause. There may be something wrong in our own setup causing changing visibility_timeout not to take effect.

def _broker_supports_visibility_timeout(url):
    return url.startswith("redis://") or url.startswith("sqs://")


log = logging.getLogger(__name__)

broker_url = conf.get('celery', 'BROKER_URL')

broker_transport_options = conf.getsection('celery_broker_transport_options') or {}
if 'visibility_timeout' not in broker_transport_options:
    if _broker_supports_visibility_timeout(broker_url):
        broker_transport_options['visibility_timeout'] = 21600
1reaction
malthecommented, Jul 2, 2021

In the case where the visibility timeout is reached, it’s confusing that there is not a clear log line that the task has been killed for taking too long to complete.

(If that’s indeed what is happening.)

@potiuk is it the case, that the Celery task is killed or is it simply no longer streaming logs into Airflow at that point?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Airflow worker stuck : Task is in the 'running' state which is not ...
I noticed same issue. This message was logged. Dependencies not met for <TaskInstance:xxxxx]>, dependency 'Task Instance State' FAILED: Task is ...
Read more >
How to fix dependency 'Task Instance Not Running' FAILED
How to fix dependency 'Task Instance Not Running' FAILED: Task is in the running state ... I sometimes get this log for tasks:...
Read more >
[GitHub] [airflow] yuqian90 opened a new issue #16163 ...
... a new issue #16163: Confusing log for long running tasks: "dependency 'Task Instance Not Running' FAILED: Task is in the running state"....
Read more >
'Task Instance State' FAILED: Task is in the 'running' state ... - Re
Re: 'Task Instance State' FAILED: Task is in the 'running' state which is not a valid state for execution. The task must be...
Read more >
DAGs, Operators, Connections, and other issues in Apache ...
For example: ... I see a 'The scheduler does not appear to be running' error ... There may be tasks being deleted mid-execution...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found