question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

task instance is scheduled repeatedly while dag file is pretty large

See original GitHub issue

Apache Airflow version: 1.10.10

Kubernetes version (if you are using kubernetes) (use kubectl version):

Environment: VM

  • Cloud provider or hardware configuration: 8core, 16GB
  • OS (e.g. from /etc/os-release): CentOS Linux release 7.2.1511 (Core)
  • Kernel (e.g. uname -a): 3.10.0-514.26.2.el7.x86_64
  • Install tools:
  • Others:

What happened:

The dag has 477 tasks. Some of the tasks were rescheduled not long after they were SUCCESS, over and over again, until I marked them success. I found such logs in logs/scheduler/latest/ff8f0e3e490d46ac8f4f933d4e28ab52.log:

{2020-07-24 14:54:28,736} {{logging_mixin.py:112}} INFO - {2020-07-24 14:54:28,736} {{dagrun.py:374}} WARNING - Failed to get task '<TaskInstance: ff8f0e3e490d46ac8f4f933d4e28ab52.task_ead3259258bb4d64a7a0207232886ee6 2020-07-24 06:48:43+00:00 [None]>' for dag '<DAG: ff8f0e3e490d46ac8f4f933d4e28ab52>'. Marking it as removed.

企业微信截图_15955845821476

企业微信截图_1595584869655

What you expected to happen:

The tasks should be found in the dag and not rescheduled if they are success.

How to reproduce it:

Maybe a dag with a large amount of tasks will reproduce it.

Anything else we need to know:

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
flaviomaxcommented, May 5, 2022

So we figured out what the problem was:

It had nothing to do with a big DAG, but with a faulty airflow deployment. When we took the deployment down, it kept a lock in the serialized_dag table in our postgres database.

When the correct deployment went up, it was unable to make changes to the values in that table. Hence, every time the scheduler tried to validate the tasks, it could not find them in the database and marked them as removed.

Simply restarting the database fixed our problem.

1reaction
flaviomaxcommented, Apr 27, 2022

I am using version 2.2.5 and I am facing the exact same problem.

I believe it must have something to do when communicating with the DB. We also have a pretty big DAG, and are only seeing that behavior on the 3~5 newest tasks.

@doowhtron 's fix probably works, but doesn’t fix the root cause (that is causing get_task to throw AirflowException). I will try and investigate further.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Scheduler — Airflow Documentation
The Airflow scheduler monitors all tasks and DAGs, then triggers the task instances once their dependencies are complete. Behind the scenes, the scheduler ......
Read more >
Why is a task stuck and not executed in airflow? - Stack Overflow
Now select the first task and click on Task Instance. In the paragraph Task Instance Details you will see why a DAG is...
Read more >
Troubleshooting Airflow scheduler issues | Cloud Composer
Troubleshooting issues with running and queued tasks · Task queues are too long · Using TimeTable feature of Airflow scheduler · Limited cluster...
Read more >
DAG writing best practices in Apache Airflow - Astronomer Docs
Use a consistent file structure​ · Use DAG name and start date properly​ · Set retries at the DAG level​.
Read more >
Airflow: When Your DAG is Far Behind The Schedule
The main task of airflow scheduler is to create a DAG run. DAG run will create a record that allow the task executor...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found