question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Airflow 1.10.10 + DAG SERIALIZATION = fails to start manually the DAG's operators

See original GitHub issue

Apache Airflow 1.10.10:

Kubernetes version (if you are using kubernetes) (use kubectl version):

Environment:

  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release): NAME=“CentOS Linux” VERSION=“7 (Core)” ID=“centos” ID_LIKE=“rhel fedora” VERSION_ID=“7” PRETTY_NAME=“CentOS Linux 7 (Core)” ANSI_COLOR=“0;31” CPE_NAME=“cpe:/o:centos:centos:7” HOME_URL=“https://www.centos.org/” BUG_REPORT_URL=“https://bugs.centos.org/

CENTOS_MANTISBT_PROJECT=“CentOS-7” CENTOS_MANTISBT_PROJECT_VERSION=“7” REDHAT_SUPPORT_PRODUCT=“centos” REDHAT_SUPPORT_PRODUCT_VERSION=“7”

  • Kernel (e.g. uname -a): Linux mid1-t029nifi-1 3.10.0-327.28.3.el7.x86_64 #1 SMP Thu Aug 18 19:05:49 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

  • Install tools: pip, yum

  • Others:

What happened:

When dag serialisation is active, If I manually start an operator, the 1st one works fine, the next will fail with this error:

Could not queue task instance for execution, dependencies not met: Trigger Rule: Task’s trigger rule ‘all_success’ requires all upstream tasks to have succeeded, but found 1 non-success(es). upstream_tasks_state={‘skipped’: Decimal(‘0’), ‘successes’: Decimal(‘0’), ‘failed’: Decimal(‘0’), ‘upstream_failed’: Decimal(‘0’), ‘done’: 0L, ‘total’: 1}, upstream_task_ids=set([u’query’]

Settings dag serialisation to false the problem does not arise.

please note : Scheduler works fine.

What you expected to happen:

I expected to start manually all the dag’s tasks from the 1st one to the last.

Code is not able to correctly find the task’s status that is before the one I’m restarting. If I start the 1st operator, anything works fine.

You can reproduce it following these steps:

  1. enable dag serialisation
  2. put the DAG in pause ( so that the scheduler won’t touch it )
  3. start the 1st operator and wait it completes and it’s successful
  4. start the 2nd operator…

op1 >> op2

Anything else we need to know:

This happens every time. Mysql 5.7.x, Python 2.7

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:2
  • Comments:9 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
kaxilcommented, Aug 4, 2020

And also I would strongly suggest upgrading to Python 3

1reaction
ozw1z5rdcommented, Aug 4, 2020

And also I would strongly suggest upgrading to Python 3

Yes, I agree. However, we need to convert our customizations code to Python 3… So for next months, if we like or not it, Python 2.7 still will stay with us.

Read more comments on GitHub >

github_iconTop Results From Across the Web

DAG Serialization — Airflow Documentation - Apache Airflow
In order to make Airflow Webserver stateless, Airflow >=1.10.7 supports DAG Serialization and DB Persistence. From Airflow 2.0.0, the Scheduler also uses ...
Read more >
DAG Serialization — Airflow Documentation - Apache Airflow
Without DAG Serialization & persistence in DB, the Webserver and the Scheduler both needs access to the DAG files. Both the scheduler and...
Read more >
DAG Runs — Airflow Documentation
A DAG Run is an object representing an instantiation of the DAG in time. ... Don't schedule, use for exclusively “externally triggered” DAGs....
Read more >
Release Notes — Airflow Documentation
New to this release of Airflow is the concept of Datasets to Airflow, and with it a new way of scheduling dags: data-aware...
Read more >
DAGs — Airflow Documentation
A Task/Operator does not usually live alone; it has dependencies on other tasks (those ... For example, if a DAG run is manually...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found