How to trigger dag manually in 2.0.1 version with pickled dags
See original GitHub issueApache Airflow version: 2.0.1
Kubernetes version (if you are using kubernetes) (use kubectl version
): No
Environment:
- Cloud provider or hardware configuration: Local Docker 7GB RAM, 2 CPU
- OS (e.g. from /etc/os-release): Debian GNU/Linux
- Kernel (e.g.
uname -a
): Linux 97bb49fdb749 4.19.121-linuxkit - Install tools: pip
- Others:
What happened:
I tried to trigger dag manually from UI which was pickled in meta DB and it crashes.
What you expected to happen:
All tasks are running and executed successfully
How to reproduce it:
- Pickle dag with DagPickle model. For example, if we have instance of created dag we execute dag.pickle()
- Trigger Dag which was pickled from UI
Using docker logs -f name_of_container
I see next logs:
[2021-03-19 09:34:54,735] {scheduler_job.py:975} INFO - Figuring out tasks to run in Pool(name=default_pool) with 128 open slots and 1 task instances ready to be queued
[2021-03-19 09:34:54,736] {scheduler_job.py:1002} INFO - DAG dummy has 0/16 running and queued tasks
[2021-03-19 09:34:54,737] {scheduler_job.py:1063} INFO - Setting the following tasks to queued state:
<TaskInstance: dummy.init 2021-03-19 09:34:53.681867+00:00 [scheduled]>
[2021-03-19 09:34:54,743] {scheduler_job.py:1105} INFO - Sending TaskInstanceKey(dag_id='dummy', task_id='init', execution_date=datetime.datetime(2021, 3, 19, 9, 34, 53, 681867, tzinfo=Timezone('UTC')), try_number=1) to executor with priority 7 and queue default
[2021-03-19 09:34:54,744] {base_executor.py:82} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'dummy', 'init', '2021-03-19T09:34:53.681867+00:00', '--pickle', '141', '--local', '--pool', 'default_pool', '--subdir', '/usr/local/airflow/dags/dag_generator.py']
[2021-03-19 09:34:54,749] {local_executor.py:81} INFO - QueuedLocalWorker running ['airflow', 'tasks', 'run', 'dummy', 'init', '2021-03-19T09:34:53.681867+00:00', '--pickle', '141', '--local', '--pool', 'default_pool', '--subdir', '/usr/local/airflow/dags/dag_generator.py']
Loading pickle id: 141
Running <TaskInstance: dummy.init 2021-03-19T09:34:53.681867+00:00 [queued]> on host 49442bde5388
[2021-03-19 09:34:58,081] {scheduler_job.py:1206} INFO - Executor reports execution of dummy.init execution_date=2021-03-19 09:34:53.681867+00:00 exited with status success for try_number 1
[2021-03-19 09:34:58 +0000] [550] [INFO] Handling signal: ttin
[2021-03-19 09:34:58 +0000] [37648] [INFO] Booting worker with pid: 37648
[2021-03-19 09:35:05 +0000] [550] [INFO] Handling signal: ttou
[2021-03-19 09:35:05 +0000] [32497] [INFO] Worker exiting (pid: 32497)
[2021-03-19 09:35:06,926] {scheduler_job.py:941} INFO - 1 tasks up for execution:
<TaskInstance: dummy.init 2021-03-19 09:34:53.681867+00:00 [scheduled]>
[2021-03-19 09:35:06,929] {scheduler_job.py:975} INFO - Figuring out tasks to run in Pool(name=default_pool) with 128 open slots and 1 task instances ready to be queued
[2021-03-19 09:35:06,929] {scheduler_job.py:1002} INFO - DAG dummy has 0/16 running and queued tasks
[2021-03-19 09:35:06,929] {scheduler_job.py:1063} INFO - Setting the following tasks to queued state:
<TaskInstance: dummy.init 2021-03-19 09:34:53.681867+00:00 [scheduled]>
[2021-03-19 09:35:06,931] {scheduler_job.py:1105} INFO - Sending TaskInstanceKey(dag_id='dummy', task_id='init', execution_date=datetime.datetime(2021, 3, 19, 9, 34, 53, 681867, tzinfo=Timezone('UTC')), try_number=2) to executor with priority 7 and queue default
[2021-03-19 09:35:06,931] {base_executor.py:82} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'dummy', 'init', '2021-03-19T09:34:53.681867+00:00', '--pickle', '141', '--local', '--pool', 'default_pool', '--subdir', '/usr/local/airflow/dags/dag_generator.py']
[2021-03-19 09:35:06,935] {local_executor.py:81} INFO - QueuedLocalWorker running ['airflow', 'tasks', 'run', 'dummy', 'init', '2021-03-19T09:34:53.681867+00:00', '--pickle', '141', '--local', '--pool', 'default_pool', '--subdir', '/usr/local/airflow/dags/dag_generator.py']
Loading pickle id: 141
Running <TaskInstance: dummy.init 2021-03-19T09:34:53.681867+00:00 [queued]> on host 49442bde5388
[2021-03-19 09:35:09,827] {scheduler_job.py:1206} INFO - Executor reports execution of dummy.init execution_date=2021-03-19 09:34:53.681867+00:00 exited with status success for try_number 2
[2021-03-19 09:35:18,884] {dagrun.py:430} ERROR - Marking run <DagRun dummy @ 2021-03-19 09:34:53.681867+00:00: manual__2021-03-19T09:34:53.681867+00:00, externally triggered: True> failed
And such kind of logs from failed task from UI:
*** Reading local file: /usr/local/airflow/logs/dummy/init/4-20210319T093453/2.log
>2021-03-19 09:35:09,215|INFO|taskinstance.py:851|Dependencies all met for <TaskInstance: dummy.init 2021-03-19T09:34:53.681867+00:00 [queued]>
>2021-03-19 09:35:09,231|INFO|taskinstance.py:851|Dependencies all met for <TaskInstance: dummy.init 2021-03-19T09:34:53.681867+00:00 [queued]>
>2021-03-19 09:35:09,231|INFO|taskinstance.py:1042|
--------------------------------------------------------------------------------
>2021-03-19 09:35:09,231|INFO|taskinstance.py:1043|Starting attempt 2 of 2
>2021-03-19 09:35:09,231|INFO|taskinstance.py:1044|
--------------------------------------------------------------------------------
>2021-03-19 09:35:09,247|INFO|taskinstance.py:1063|Executing <Task(PythonOperator): init> on 2021-03-19T09:34:53.681867+00:00
>2021-03-19 09:35:09,252|INFO|standard_task_runner.py:52|Started process 38115 to run task
>2021-03-19 09:35:09,257|INFO|standard_task_runner.py:76|Running: ['airflow', 'tasks', 'run', 'dummy', 'init', '2021-03-19T09:34:53.681867+00:00', '--pickle', '141', '--job-id', '25', '--pool', 'default_pool', '--raw', '--cfg-path', '/tmp/tmpo6temkwx', '--error-file', '/tmp/tmpv5y3gphw']
>2021-03-19 09:35:09,258|INFO|standard_task_runner.py:77|Job 25: Subtask init
>2021-03-19 09:35:09,284|INFO|local_task_job.py:146|Task exited with return code 1
Anything else we need to know:
When I cleaned dag_pickle table in meta DB and clean last_pickle column in table dag and don’t pickle dags at all everything is working well. To approve this statement show logs where flag --pickle wasn’t used and everything worked successfully:
[2021-03-19 10:22:15,074] {scheduler_job.py:941} INFO - 1 tasks up for execution:
<TaskInstance: dummy.init 2021-03-19 10:21:11.696790+00:00 [scheduled]>
[2021-03-19 10:22:15,080] {scheduler_job.py:975} INFO - Figuring out tasks to run in Pool(name=default_pool) with 128 open slots and 1 task instances ready to be queued
[2021-03-19 10:22:15,082] {scheduler_job.py:1002} INFO - DAG dummy has 0/16 running and queued tasks
[2021-03-19 10:22:15,085] {scheduler_job.py:1063} INFO - Setting the following tasks to queued state:
<TaskInstance: dummy.init 2021-03-19 10:21:11.696790+00:00 [scheduled]>
[2021-03-19 10:22:15,089] {scheduler_job.py:1105} INFO - Sending TaskInstanceKey(dag_id='dummy', task_id='init', execution_date=datetime.datetime(2021, 3, 19, 10, 21, 11, 696790, tzinfo=Timezone('UTC')), try_number=1) to executor with priority 7 and queue default
[2021-03-19 10:22:15,090] {base_executor.py:82} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'dummy', 'init', '2021-03-19T10:21:11.696790+00:00', '--local', '--pool', 'default_pool', '--subdir', '/usr/local/airflow/dags/dag_generator.py']
[2021-03-19 10:22:15,096] {local_executor.py:81} INFO - QueuedLocalWorker running ['airflow', 'tasks', 'run', 'dummy', 'init', '2021-03-19T10:21:11.696790+00:00', '--local', '--pool', 'default_pool', '--subdir', '/usr/local/airflow/dags/dag_generator.py']
[2021-03-19 10:22:15,320] {dagbag.py:448} INFO - Filling up the DagBag from /usr/local/airflow/dags/dag_generator.py
Running <TaskInstance: dummy.init 2021-03-19T10:21:11.696790+00:00 [queued]> on host 49442bde5388
[2021-03-19 10:22:18,384] {scheduler_job.py:941} INFO - 2 tasks up for execution:
<TaskInstance: dummy.df_now1 2021-03-19 10:21:11.696790+00:00 [scheduled]>
<TaskInstance: dummy.df_now2 2021-03-19 10:21:11.696790+00:00 [scheduled]>
[2021-03-19 10:22:18,386] {scheduler_job.py:975} INFO - Figuring out tasks to run in Pool(name=default_pool) with 128 open slots and 2 task instances ready to be queued
[2021-03-19 10:22:18,387] {scheduler_job.py:1002} INFO - DAG dummy has 0/16 running and queued tasks
[2021-03-19 10:22:18,387] {scheduler_job.py:1002} INFO - DAG dummy has 1/16 running and queued tasks
And also logs from UI about this task:
*** Reading local file: /usr/local/airflow/logs/dummy/init/5-20210319T102111/1.log
>2021-03-19 10:22:17,638|INFO|taskinstance.py:851|Dependencies all met for <TaskInstance: dummy.init 2021-03-19T10:21:11.696790+00:00 [queued]>
>2021-03-19 10:22:17,654|INFO|taskinstance.py:851|Dependencies all met for <TaskInstance: dummy.init 2021-03-19T10:21:11.696790+00:00 [queued]>
>2021-03-19 10:22:17,654|INFO|taskinstance.py:1042|
--------------------------------------------------------------------------------
>2021-03-19 10:22:17,654|INFO|taskinstance.py:1043|Starting attempt 1 of 2
>2021-03-19 10:22:17,654|INFO|taskinstance.py:1044|
--------------------------------------------------------------------------------
>2021-03-19 10:22:17,669|INFO|taskinstance.py:1063|Executing <Task(PythonOperator): init> on 2021-03-19T10:21:11.696790+00:00
>2021-03-19 10:22:17,674|INFO|standard_task_runner.py:52|Started process 6413 to run task
>2021-03-19 10:22:17,682|INFO|standard_task_runner.py:76|Running: ['airflow', 'tasks', 'run', 'dummy', 'init', '2021-03-19T10:21:11.696790+00:00', '--job-id', '26', '--pool', 'default_pool', '--raw', '--subdir', 'DAGS_FOLDER/dag_generator.py', '--cfg-path', '/tmp/tmpqkmkygyb', '--error-file', '/tmp/tmp2k6izpaa']
>2021-03-19 10:22:17,683|INFO|standard_task_runner.py:77|Job 26: Subtask init
>2021-03-19 10:22:17,733|INFO|logging_mixin.py:104|Running <TaskInstance: dummy.init 2021-03-19T10:21:11.696790+00:00 [running]> on host 49442bde5388
>2021-03-19 10:22:17,777|INFO|taskinstance.py:1257|Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=dummy
AIRFLOW_CTX_TASK_ID=init
AIRFLOW_CTX_EXECUTION_DATE=2021-03-19T10:21:11.696790+00:00
AIRFLOW_CTX_DAG_RUN_ID=manual__2021-03-19T10:21:11.696790+00:00
>2021-03-19 10:22:17,778|INFO|python.py:118|Done. Returned value was: None
>2021-03-19 10:22:17,787|INFO|taskinstance.py:1166|Marking task as SUCCESS. dag_id=dummy, task_id=init, execution_date=20210319T102111, start_date=20210319T102217, end_date=20210319T102217
>2021-03-19 10:22:17,818|INFO|taskinstance.py:1220|2 downstream tasks scheduled from follow-on schedule check
>2021-03-19 10:22:17,853|INFO|local_task_job.py:146|Task exited with return code 0
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
DAGs — Airflow Documentation
When they are triggered either manually or via the API. On a defined schedule, which is defined as part of the DAG. DAGs...
Read more >How to trigger airflow dag manually? - Stack Overflow
UI: Click the "Trigger DAG" button either on the main DAG or a specific DAG. dags dag; CLI: Run airflow trigger_dag <dag_id> ,...
Read more >Airflow 2.2.3 : "airflow dags trigger" command gets "Calling ...
Airflow 2.2.3 : "airflow dags trigger" command gets "Calling DAG.create_dagrun() without an explicit data interval is deprecated" #20579.
Read more >A Guide on Different Ways to Trigger Airflow DAGs - Hevo Data
Trigger Airflow DAGs Manually: It's possible to trigger DAG manually via Airflow UI or by running a CLI command.
Read more >The TriggerDagRunOperator in Airflow! Create DAG ...
DAG dependency in Airflow is a though topic. ... Indeed, with the new version of the TriggerDagRunOperator, in Airflow 2.0 it has never...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@turbaszek yes, of course I’ll try to do my best to fix it as soon as possible
@koliankolin are you still working on it?