When triggering a job via airfow UI, job fails abruptly with Negsignal.SIGSEGV
See original GitHub issueApache Airflow version: 2.0
Kubernetes version (if you are using kubernetes) (use kubectl version
): NA
Environment: MAC
- Cloud provider or hardware configuration: NA
- OS (e.g. from /etc/os-release): Mac Big sur
- Kernel (e.g.
uname -a
):local 20.4.0 Darwin Kernel Version 20.4.0 - Install tools: NA
- Others: NA
What happened:
We are using airflow jobs to upload data to big query and created python operators and triggering them via creating dags. So when we run manually suing airflow tasks test <dag_id> <task_id> <date> things works fine ,but the same when triggered via UI, its failing with error
*** Reading local file: /Users/rdoppalapudi/airflow_project//logs/ygrene_etl_process/run_main_etl_project/2021-06-03T14:55:02.676999+00:00/1.log
[2021-06-03 10:55:07,575] {taskinstance.py:876} INFO - Dependencies all met for <TaskInstance: ygrene_etl_process.run_main_etl_project 2021-06-03T14:55:02.676999+00:00 [queued]>
[2021-06-03 10:55:07,580] {taskinstance.py:876} INFO - Dependencies all met for <TaskInstance: ygrene_etl_process.run_main_etl_project 2021-06-03T14:55:02.676999+00:00 [queued]>
[2021-06-03 10:55:07,580] {taskinstance.py:1067} INFO -
--------------------------------------------------------------------------------
[2021-06-03 10:55:07,580] {taskinstance.py:1068} INFO - Starting attempt 1 of 1
[2021-06-03 10:55:07,580] {taskinstance.py:1069} INFO -
--------------------------------------------------------------------------------
[2021-06-03 10:55:07,586] {taskinstance.py:1087} INFO - Executing <Task(PythonOperator): run_main_etl_project> on 2021-06-03T14:55:02.676999+00:00
[2021-06-03 10:55:07,589] {standard_task_runner.py:52} INFO - Started process 9133 to run task
[2021-06-03 10:55:07,595] {standard_task_runner.py:76} INFO - Running: ['airflow', 'tasks', 'run', 'ygrene_etl_process', 'run_main_etl_project', '2021-06-03T14:55:02.676999+00:00', '--job-id', '16', '--pool', 'default_pool', '--raw', '--subdir', '/Users/rdoppalapudi/airflow_project/dags/etl_airflow.py', '--cfg-path', '/var/folders/5n/72l0n8zd261dnlkm3n902my0pmw_c2/T/tmp4fd_41fd', '--error-file', '/var/folders/5n/72l0n8zd261dnlkm3n902my0pmw_c2/T/tmpn67lg3s9']
[2021-06-03 10:55:07,597] {standard_task_runner.py:77} INFO - Job 16: Subtask run_main_etl_project
[2021-06-03 10:55:07,625] {logging_mixin.py:104} INFO - Running <TaskInstance: ygrene_etl_process.run_main_etl_project 2021-06-03T14:55:02.676999+00:00 [running]> on host 1.0.0.127.in-addr.arpa
[2021-06-03 10:55:07,649] {taskinstance.py:1280} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=ygrene
AIRFLOW_CTX_DAG_ID=ygrene_etl_process
AIRFLOW_CTX_TASK_ID=run_main_etl_project
AIRFLOW_CTX_EXECUTION_DATE=2021-06-03T14:55:02.676999+00:00
AIRFLOW_CTX_DAG_RUN_ID=manual__2021-06-03T14:55:02.676999+00:00
[2021-06-03 10:55:07,925] {etl_airflow.py:32} INFO -
run_id = manual__2021-06-03T14:55:02.676999+00:00
dag_id = DAG: ygrene_etl_process
task_id = Task(PythonOperator): run_main_etl_project
[2021-06-03 10:55:08,246] {transport.py:1819} INFO - Connected (version 2.0, client OpenSSH_7.4)
[2021-06-03 10:55:08,954] {transport.py:1819} INFO - Authentication (publickey) successful!
[2021-06-03 10:55:14,328] {data_integration.py:29} INFO - Uploading data for projects
[2021-06-03 10:55:14,329] {data_integration.py:31} INFO - Creating bigq obj
[2021-06-03 10:55:26,035] {bigquery_wrapper_apis.py:117} INFO - Got the original json to be uploaded
[2021-06-03 10:55:27,451] {bigquery_wrapper_apis.py:102} INFO - Creating big client obj
[2021-06-03 10:55:27,687] {local_task_job.py:151} INFO - Task exited with return code Negsignal.SIGSEGV\
What you expected to happen:
Not sure whats going wrong exactly as scheduler prompt shows up some logs with error
Running <TaskInstance: ygrene_etl_process.run_main_etl_project 2021-06-03T14:55:02.676999+00:00 [queued]> on host 1.0.0.127.in-addr.arpa
The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug.
[2021-06-03 10:55:28,046] {scheduler_job.py:1205} INFO - Executor reports execution of ygrene_etl_process.run_main_etl_project execution_date=2021-06-03 14:55:02.676999+00:00 exited with status success for try_number 1
[2021-06-03 10:55:29,427] {dagrun.py:429} ERROR - Marking run <DagRun ygrene_etl_process @ 2021-06-03 14:55:02.676999+00:00: manual__2021-06-03T14:55:02.676999+00:00, externally triggered: True> failed
[2021-06-03 10:56:10,676] {scheduler_job.py:1822} INFO - Resetting orphaned tasks for active dag runs
[2021-06-03 11:01:10,846] {scheduler_job.py:1822} INFO - Resetting orphaned tasks for active dag runs
How to reproduce it:
No Sure really how to reproduce as these things all working fine till last week
Anything else we need to know:
Dag task works fine manually not sure why its failing only during scheduled task run from UI and there is no clear information on what is happening internally, also the issue looks to be more generic and related to multiprocessing (this we understand, after looking related information on web)
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (9 by maintainers)
Top GitHub Comments
We encountered a similar issue with BigQueryInsertJobOperator since yesterday. The BQ job keeps running without issues, while the task on the airflow side gets killed with
SIGSEGV
.We’re still trying to narrow down the root cause. It’s hard to reproduce as it only happens with specific tasks, which do not differ significantly from others.
I think the problem is that there is nothing we can do in airflow with it.
What would be your suggestion to do ? Do you have any proposal what more we can do?
We cannot really control environment variables set for external tools or airflow itself. The grpc library might be used in various context and at most it is an optional add-on to Airflow via some providers that use the library. Similarly as we do not tell people how they should configure their celery, we do not tell them either what variables to put there for various cases.
I consider this an edge-case and something that is more of a deployment issue not Airflow’s one. The issue is public, indexed by Google. And after your comment (Thanks!) that digested the linked discussion it now even has the suggestion from you to try
GRPC_POLL_STRATEGY
- so if someone encounters similar issue they can find it here and try those different remedies suggested here or in the linked issue.I honestly think it’s quite enough 😃.