question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

When triggering a job via airfow UI, job fails abruptly with Negsignal.SIGSEGV

See original GitHub issue

Apache Airflow version: 2.0

Kubernetes version (if you are using kubernetes) (use kubectl version): NA

Environment: MAC

  • Cloud provider or hardware configuration: NA
  • OS (e.g. from /etc/os-release): Mac Big sur
  • Kernel (e.g. uname -a):local 20.4.0 Darwin Kernel Version 20.4.0
  • Install tools: NA
  • Others: NA

What happened:

We are using airflow jobs to upload data to big query and created python operators and triggering them via creating dags. So when we run manually suing airflow tasks test <dag_id> <task_id> <date> things works fine ,but the same when triggered via UI, its failing with error

*** Reading local file: /Users/rdoppalapudi/airflow_project//logs/ygrene_etl_process/run_main_etl_project/2021-06-03T14:55:02.676999+00:00/1.log
[2021-06-03 10:55:07,575] {taskinstance.py:876} INFO - Dependencies all met for <TaskInstance: ygrene_etl_process.run_main_etl_project 2021-06-03T14:55:02.676999+00:00 [queued]>
[2021-06-03 10:55:07,580] {taskinstance.py:876} INFO - Dependencies all met for <TaskInstance: ygrene_etl_process.run_main_etl_project 2021-06-03T14:55:02.676999+00:00 [queued]>
[2021-06-03 10:55:07,580] {taskinstance.py:1067} INFO - 
--------------------------------------------------------------------------------
[2021-06-03 10:55:07,580] {taskinstance.py:1068} INFO - Starting attempt 1 of 1
[2021-06-03 10:55:07,580] {taskinstance.py:1069} INFO - 
--------------------------------------------------------------------------------
[2021-06-03 10:55:07,586] {taskinstance.py:1087} INFO - Executing <Task(PythonOperator): run_main_etl_project> on 2021-06-03T14:55:02.676999+00:00
[2021-06-03 10:55:07,589] {standard_task_runner.py:52} INFO - Started process 9133 to run task
[2021-06-03 10:55:07,595] {standard_task_runner.py:76} INFO - Running: ['airflow', 'tasks', 'run', 'ygrene_etl_process', 'run_main_etl_project', '2021-06-03T14:55:02.676999+00:00', '--job-id', '16', '--pool', 'default_pool', '--raw', '--subdir', '/Users/rdoppalapudi/airflow_project/dags/etl_airflow.py', '--cfg-path', '/var/folders/5n/72l0n8zd261dnlkm3n902my0pmw_c2/T/tmp4fd_41fd', '--error-file', '/var/folders/5n/72l0n8zd261dnlkm3n902my0pmw_c2/T/tmpn67lg3s9']
[2021-06-03 10:55:07,597] {standard_task_runner.py:77} INFO - Job 16: Subtask run_main_etl_project
[2021-06-03 10:55:07,625] {logging_mixin.py:104} INFO - Running <TaskInstance: ygrene_etl_process.run_main_etl_project 2021-06-03T14:55:02.676999+00:00 [running]> on host 1.0.0.127.in-addr.arpa
[2021-06-03 10:55:07,649] {taskinstance.py:1280} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=ygrene
AIRFLOW_CTX_DAG_ID=ygrene_etl_process
AIRFLOW_CTX_TASK_ID=run_main_etl_project
AIRFLOW_CTX_EXECUTION_DATE=2021-06-03T14:55:02.676999+00:00
AIRFLOW_CTX_DAG_RUN_ID=manual__2021-06-03T14:55:02.676999+00:00
[2021-06-03 10:55:07,925] {etl_airflow.py:32} INFO - 
run_id = manual__2021-06-03T14:55:02.676999+00:00 
 dag_id = DAG: ygrene_etl_process 
 task_id = Task(PythonOperator): run_main_etl_project
[2021-06-03 10:55:08,246] {transport.py:1819} INFO - Connected (version 2.0, client OpenSSH_7.4)
[2021-06-03 10:55:08,954] {transport.py:1819} INFO - Authentication (publickey) successful!
[2021-06-03 10:55:14,328] {data_integration.py:29} INFO - Uploading data for projects 
[2021-06-03 10:55:14,329] {data_integration.py:31} INFO - Creating bigq obj
[2021-06-03 10:55:26,035] {bigquery_wrapper_apis.py:117} INFO - Got the original json to be uploaded
[2021-06-03 10:55:27,451] {bigquery_wrapper_apis.py:102} INFO - Creating big client obj
[2021-06-03 10:55:27,687] {local_task_job.py:151} INFO - Task exited with return code Negsignal.SIGSEGV\

What you expected to happen:

Not sure whats going wrong exactly as scheduler prompt shows up some logs with error

Running <TaskInstance: ygrene_etl_process.run_main_etl_project 2021-06-03T14:55:02.676999+00:00 [queued]> on host 1.0.0.127.in-addr.arpa
The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug.
[2021-06-03 10:55:28,046] {scheduler_job.py:1205} INFO - Executor reports execution of ygrene_etl_process.run_main_etl_project execution_date=2021-06-03 14:55:02.676999+00:00 exited with status success for try_number 1
[2021-06-03 10:55:29,427] {dagrun.py:429} ERROR - Marking run <DagRun ygrene_etl_process @ 2021-06-03 14:55:02.676999+00:00: manual__2021-06-03T14:55:02.676999+00:00, externally triggered: True> failed
[2021-06-03 10:56:10,676] {scheduler_job.py:1822} INFO - Resetting orphaned tasks for active dag runs
[2021-06-03 11:01:10,846] {scheduler_job.py:1822} INFO - Resetting orphaned tasks for active dag runs

How to reproduce it:

No Sure really how to reproduce as these things all working fine till last week

Anything else we need to know:

Dag task works fine manually not sure why its failing only during scheduled task run from UI and there is no clear information on what is happening internally, also the issue looks to be more generic and related to multiprocessing (this we understand, after looking related information on web)

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:11 (9 by maintainers)

github_iconTop GitHub Comments

4reactions
m1racolicommented, Jul 15, 2021

We encountered a similar issue with BigQueryInsertJobOperator since yesterday. The BQ job keeps running without issues, while the task on the airflow side gets killed with SIGSEGV.

We’re still trying to narrow down the root cause. It’s hard to reproduce as it only happens with specific tasks, which do not differ significantly from others.

2reactions
potiukcommented, Jul 21, 2021

I think the problem is that there is nothing we can do in airflow with it.

What would be your suggestion to do ? Do you have any proposal what more we can do?

We cannot really control environment variables set for external tools or airflow itself. The grpc library might be used in various context and at most it is an optional add-on to Airflow via some providers that use the library. Similarly as we do not tell people how they should configure their celery, we do not tell them either what variables to put there for various cases.

I consider this an edge-case and something that is more of a deployment issue not Airflow’s one. The issue is public, indexed by Google. And after your comment (Thanks!) that digested the linked discussion it now even has the suggestion from you to try GRPC_POLL_STRATEGY - so if someone encounters similar issue they can find it here and try those different remedies suggested here or in the linked issue.

I honestly think it’s quite enough 😃.

Read more comments on GitHub >

github_iconTop Results From Across the Web

[GitHub] [airflow] potiuk commented on issue #16243
[GitHub] [airflow] potiuk commented on issue #16243: When triggering a job via airfow UI, job fails abruptly with Negsignal.SIGSEGV.
Read more >
Airflow - External API call gives Negsignal.SIGSEGV error
The Python script to call the weather API work fine when ran outside Airflow. DAG from airflow import DAG from airflow.operators.bash_operator ...
Read more >
Deferrable Operators & Triggers - Apache Airflow
A deferrable operator is one that is written with the ability to suspend itself and free up the worker when it knows it...
Read more >
How To Fix Task received SIGTERM signal In Airflow
In today's article I will go through a few potential solutions to the SIGTERM signal that is sent to tasks, causing Airflow DAGs...
Read more >
Known issues | Cloud Composer
Airflow UI does not show tasks logs when DAG Serialization is on in Composer ... We work on improving Cloud Composer service to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found