question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to trigger dag manually in 2.0.1 version with pickled dags

See original GitHub issue

Apache Airflow version: 2.0.1

Kubernetes version (if you are using kubernetes) (use kubectl version): No

Environment:

  • Cloud provider or hardware configuration: Local Docker 7GB RAM, 2 CPU
  • OS (e.g. from /etc/os-release): Debian GNU/Linux
  • Kernel (e.g. uname -a): Linux 97bb49fdb749 4.19.121-linuxkit
  • Install tools: pip
  • Others:

What happened:

I tried to trigger dag manually from UI which was pickled in meta DB and it crashes.

What you expected to happen:

All tasks are running and executed successfully

How to reproduce it:

  1. Pickle dag with DagPickle model. For example, if we have instance of created dag we execute dag.pickle()
  2. Trigger Dag which was pickled from UI

Using docker logs -f name_of_container I see next logs:

[2021-03-19 09:34:54,735] {scheduler_job.py:975} INFO - Figuring out tasks to run in Pool(name=default_pool) with 128 open slots and 1 task instances ready to be queued
[2021-03-19 09:34:54,736] {scheduler_job.py:1002} INFO - DAG dummy has 0/16 running and queued tasks
[2021-03-19 09:34:54,737] {scheduler_job.py:1063} INFO - Setting the following tasks to queued state:
	<TaskInstance: dummy.init 2021-03-19 09:34:53.681867+00:00 [scheduled]>
[2021-03-19 09:34:54,743] {scheduler_job.py:1105} INFO - Sending TaskInstanceKey(dag_id='dummy', task_id='init', execution_date=datetime.datetime(2021, 3, 19, 9, 34, 53, 681867, tzinfo=Timezone('UTC')), try_number=1) to executor with priority 7 and queue default
[2021-03-19 09:34:54,744] {base_executor.py:82} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'dummy', 'init', '2021-03-19T09:34:53.681867+00:00', '--pickle', '141', '--local', '--pool', 'default_pool', '--subdir', '/usr/local/airflow/dags/dag_generator.py']
[2021-03-19 09:34:54,749] {local_executor.py:81} INFO - QueuedLocalWorker running ['airflow', 'tasks', 'run', 'dummy', 'init', '2021-03-19T09:34:53.681867+00:00', '--pickle', '141', '--local', '--pool', 'default_pool', '--subdir', '/usr/local/airflow/dags/dag_generator.py']
Loading pickle id: 141
Running <TaskInstance: dummy.init 2021-03-19T09:34:53.681867+00:00 [queued]> on host 49442bde5388
[2021-03-19 09:34:58,081] {scheduler_job.py:1206} INFO - Executor reports execution of dummy.init execution_date=2021-03-19 09:34:53.681867+00:00 exited with status success for try_number 1
[2021-03-19 09:34:58 +0000] [550] [INFO] Handling signal: ttin
[2021-03-19 09:34:58 +0000] [37648] [INFO] Booting worker with pid: 37648
[2021-03-19 09:35:05 +0000] [550] [INFO] Handling signal: ttou
[2021-03-19 09:35:05 +0000] [32497] [INFO] Worker exiting (pid: 32497)
[2021-03-19 09:35:06,926] {scheduler_job.py:941} INFO - 1 tasks up for execution:
	<TaskInstance: dummy.init 2021-03-19 09:34:53.681867+00:00 [scheduled]>
[2021-03-19 09:35:06,929] {scheduler_job.py:975} INFO - Figuring out tasks to run in Pool(name=default_pool) with 128 open slots and 1 task instances ready to be queued
[2021-03-19 09:35:06,929] {scheduler_job.py:1002} INFO - DAG dummy has 0/16 running and queued tasks
[2021-03-19 09:35:06,929] {scheduler_job.py:1063} INFO - Setting the following tasks to queued state:
	<TaskInstance: dummy.init 2021-03-19 09:34:53.681867+00:00 [scheduled]>
[2021-03-19 09:35:06,931] {scheduler_job.py:1105} INFO - Sending TaskInstanceKey(dag_id='dummy', task_id='init', execution_date=datetime.datetime(2021, 3, 19, 9, 34, 53, 681867, tzinfo=Timezone('UTC')), try_number=2) to executor with priority 7 and queue default
[2021-03-19 09:35:06,931] {base_executor.py:82} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'dummy', 'init', '2021-03-19T09:34:53.681867+00:00', '--pickle', '141', '--local', '--pool', 'default_pool', '--subdir', '/usr/local/airflow/dags/dag_generator.py']
[2021-03-19 09:35:06,935] {local_executor.py:81} INFO - QueuedLocalWorker running ['airflow', 'tasks', 'run', 'dummy', 'init', '2021-03-19T09:34:53.681867+00:00', '--pickle', '141', '--local', '--pool', 'default_pool', '--subdir', '/usr/local/airflow/dags/dag_generator.py']
Loading pickle id: 141
Running <TaskInstance: dummy.init 2021-03-19T09:34:53.681867+00:00 [queued]> on host 49442bde5388
[2021-03-19 09:35:09,827] {scheduler_job.py:1206} INFO - Executor reports execution of dummy.init execution_date=2021-03-19 09:34:53.681867+00:00 exited with status success for try_number 2
[2021-03-19 09:35:18,884] {dagrun.py:430} ERROR - Marking run <DagRun dummy @ 2021-03-19 09:34:53.681867+00:00: manual__2021-03-19T09:34:53.681867+00:00, externally triggered: True> failed

And such kind of logs from failed task from UI:

*** Reading local file: /usr/local/airflow/logs/dummy/init/4-20210319T093453/2.log
>2021-03-19 09:35:09,215|INFO|taskinstance.py:851|Dependencies all met for <TaskInstance: dummy.init 2021-03-19T09:34:53.681867+00:00 [queued]>
>2021-03-19 09:35:09,231|INFO|taskinstance.py:851|Dependencies all met for <TaskInstance: dummy.init 2021-03-19T09:34:53.681867+00:00 [queued]>
>2021-03-19 09:35:09,231|INFO|taskinstance.py:1042|
--------------------------------------------------------------------------------
>2021-03-19 09:35:09,231|INFO|taskinstance.py:1043|Starting attempt 2 of 2
>2021-03-19 09:35:09,231|INFO|taskinstance.py:1044|
--------------------------------------------------------------------------------
>2021-03-19 09:35:09,247|INFO|taskinstance.py:1063|Executing <Task(PythonOperator): init> on 2021-03-19T09:34:53.681867+00:00
>2021-03-19 09:35:09,252|INFO|standard_task_runner.py:52|Started process 38115 to run task
>2021-03-19 09:35:09,257|INFO|standard_task_runner.py:76|Running: ['airflow', 'tasks', 'run', 'dummy', 'init', '2021-03-19T09:34:53.681867+00:00', '--pickle', '141', '--job-id', '25', '--pool', 'default_pool', '--raw', '--cfg-path', '/tmp/tmpo6temkwx', '--error-file', '/tmp/tmpv5y3gphw']
>2021-03-19 09:35:09,258|INFO|standard_task_runner.py:77|Job 25: Subtask init
>2021-03-19 09:35:09,284|INFO|local_task_job.py:146|Task exited with return code 1

Anything else we need to know:

When I cleaned dag_pickle table in meta DB and clean last_pickle column in table dag and don’t pickle dags at all everything is working well. To approve this statement show logs where flag --pickle wasn’t used and everything worked successfully:

[2021-03-19 10:22:15,074] {scheduler_job.py:941} INFO - 1 tasks up for execution:
	<TaskInstance: dummy.init 2021-03-19 10:21:11.696790+00:00 [scheduled]>
[2021-03-19 10:22:15,080] {scheduler_job.py:975} INFO - Figuring out tasks to run in Pool(name=default_pool) with 128 open slots and 1 task instances ready to be queued
[2021-03-19 10:22:15,082] {scheduler_job.py:1002} INFO - DAG dummy has 0/16 running and queued tasks
[2021-03-19 10:22:15,085] {scheduler_job.py:1063} INFO - Setting the following tasks to queued state:
	<TaskInstance: dummy.init 2021-03-19 10:21:11.696790+00:00 [scheduled]>
[2021-03-19 10:22:15,089] {scheduler_job.py:1105} INFO - Sending TaskInstanceKey(dag_id='dummy', task_id='init', execution_date=datetime.datetime(2021, 3, 19, 10, 21, 11, 696790, tzinfo=Timezone('UTC')), try_number=1) to executor with priority 7 and queue default
[2021-03-19 10:22:15,090] {base_executor.py:82} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'dummy', 'init', '2021-03-19T10:21:11.696790+00:00', '--local', '--pool', 'default_pool', '--subdir', '/usr/local/airflow/dags/dag_generator.py']
[2021-03-19 10:22:15,096] {local_executor.py:81} INFO - QueuedLocalWorker running ['airflow', 'tasks', 'run', 'dummy', 'init', '2021-03-19T10:21:11.696790+00:00', '--local', '--pool', 'default_pool', '--subdir', '/usr/local/airflow/dags/dag_generator.py']
[2021-03-19 10:22:15,320] {dagbag.py:448} INFO - Filling up the DagBag from /usr/local/airflow/dags/dag_generator.py
Running <TaskInstance: dummy.init 2021-03-19T10:21:11.696790+00:00 [queued]> on host 49442bde5388
[2021-03-19 10:22:18,384] {scheduler_job.py:941} INFO - 2 tasks up for execution:
	<TaskInstance: dummy.df_now1 2021-03-19 10:21:11.696790+00:00 [scheduled]>
	<TaskInstance: dummy.df_now2 2021-03-19 10:21:11.696790+00:00 [scheduled]>
[2021-03-19 10:22:18,386] {scheduler_job.py:975} INFO - Figuring out tasks to run in Pool(name=default_pool) with 128 open slots and 2 task instances ready to be queued
[2021-03-19 10:22:18,387] {scheduler_job.py:1002} INFO - DAG dummy has 0/16 running and queued tasks
[2021-03-19 10:22:18,387] {scheduler_job.py:1002} INFO - DAG dummy has 1/16 running and queued tasks

And also logs from UI about this task:

*** Reading local file: /usr/local/airflow/logs/dummy/init/5-20210319T102111/1.log
>2021-03-19 10:22:17,638|INFO|taskinstance.py:851|Dependencies all met for <TaskInstance: dummy.init 2021-03-19T10:21:11.696790+00:00 [queued]>
>2021-03-19 10:22:17,654|INFO|taskinstance.py:851|Dependencies all met for <TaskInstance: dummy.init 2021-03-19T10:21:11.696790+00:00 [queued]>
>2021-03-19 10:22:17,654|INFO|taskinstance.py:1042|
--------------------------------------------------------------------------------
>2021-03-19 10:22:17,654|INFO|taskinstance.py:1043|Starting attempt 1 of 2
>2021-03-19 10:22:17,654|INFO|taskinstance.py:1044|
--------------------------------------------------------------------------------
>2021-03-19 10:22:17,669|INFO|taskinstance.py:1063|Executing <Task(PythonOperator): init> on 2021-03-19T10:21:11.696790+00:00
>2021-03-19 10:22:17,674|INFO|standard_task_runner.py:52|Started process 6413 to run task
>2021-03-19 10:22:17,682|INFO|standard_task_runner.py:76|Running: ['airflow', 'tasks', 'run', 'dummy', 'init', '2021-03-19T10:21:11.696790+00:00', '--job-id', '26', '--pool', 'default_pool', '--raw', '--subdir', 'DAGS_FOLDER/dag_generator.py', '--cfg-path', '/tmp/tmpqkmkygyb', '--error-file', '/tmp/tmp2k6izpaa']
>2021-03-19 10:22:17,683|INFO|standard_task_runner.py:77|Job 26: Subtask init
>2021-03-19 10:22:17,733|INFO|logging_mixin.py:104|Running <TaskInstance: dummy.init 2021-03-19T10:21:11.696790+00:00 [running]> on host 49442bde5388
>2021-03-19 10:22:17,777|INFO|taskinstance.py:1257|Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=dummy
AIRFLOW_CTX_TASK_ID=init
AIRFLOW_CTX_EXECUTION_DATE=2021-03-19T10:21:11.696790+00:00
AIRFLOW_CTX_DAG_RUN_ID=manual__2021-03-19T10:21:11.696790+00:00
>2021-03-19 10:22:17,778|INFO|python.py:118|Done. Returned value was: None
>2021-03-19 10:22:17,787|INFO|taskinstance.py:1166|Marking task as SUCCESS. dag_id=dummy, task_id=init, execution_date=20210319T102111, start_date=20210319T102217, end_date=20210319T102217
>2021-03-19 10:22:17,818|INFO|taskinstance.py:1220|2 downstream tasks scheduled from follow-on schedule check
>2021-03-19 10:22:17,853|INFO|local_task_job.py:146|Task exited with return code 0

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

3reactions
koliankolincommented, Mar 28, 2021

@turbaszek yes, of course I’ll try to do my best to fix it as soon as possible

0reactions
eladkalcommented, Sep 26, 2021

@koliankolin are you still working on it?

Read more comments on GitHub >

github_iconTop Results From Across the Web

DAGs — Airflow Documentation
When they are triggered either manually or via the API. On a defined schedule, which is defined as part of the DAG. DAGs...
Read more >
How to trigger airflow dag manually? - Stack Overflow
UI: Click the "Trigger DAG" button either on the main DAG or a specific DAG. dags dag; CLI: Run airflow trigger_dag <dag_id> ,...
Read more >
Airflow 2.2.3 : "airflow dags trigger" command gets "Calling ...
Airflow 2.2.3 : "airflow dags trigger" command gets "Calling DAG.create_dagrun() without an explicit data interval is deprecated" #20579.
Read more >
A Guide on Different Ways to Trigger Airflow DAGs - Hevo Data
Trigger Airflow DAGs Manually: It's possible to trigger DAG manually via Airflow UI or by running a CLI command.
Read more >
The TriggerDagRunOperator in Airflow! Create DAG ...
DAG dependency in Airflow is a though topic. ... Indeed, with the new version of the TriggerDagRunOperator, in Airflow 2.0 it has never...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found