Add on_kill method to DataprocSubmitJobOperator
See original GitHub issueDescription
This operator should implement on_kill
method using cancel_job
method of DataprocHook so in case of termination we cancel running job. This option probably should be configurable (for example cancel_on_kill
) because of request_id
prameter in a job definition: https://googleapis.dev/python/dataproc/latest/gapic/v1/api.html#google.cloud.dataproc_v1.JobControllerClient.submit_job
I’m happy to help with system test 👍
Use case / motivation
Remove dangling jobs when operator is terminated.
Related Issues
https://github.com/apache/airflow/pull/6371 https://github.com/apache/airflow/pull/6371#issuecomment-590757917
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
How to stop Dataproc Job from airflow - Stack Overflow
Is there any way, we can kill dataproc jobs directly via Airflow, if dataproc job id is provided as parameter. Google Cloud Collective....
Read more >gcloud dataproc jobs kill | Google Cloud CLI Documentation
gcloud dataproc jobs kill job_id; POSITIONAL ARGUMENTS. Job resource - The ID of the job to kill. The arguments in this group can...
Read more >Newest 'airflow' Questions - Stack Overflow
Airflow: How to load data from a REST API to BigQuery? I am new to Airflow here, and I am trying to write...
Read more >How to release all COM objects when handling an...anycodings
I fixed my code by modifying these lines anycodings_memory-management of code in closeExcel method: if (application != null) { application.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
DataprocSubmitJobOperator
has no relation withDataprocJobBaseOperator
.DataprocJobBaseOperator
is used by “old” operators likeDataprocSubmitPigJobOperator
,DataprocSubmitHiveJobOperator
etc. that are deprecated in favor of the generic operatorDataprocSubmitJobOperator
.However, you are right that the logic of
on_kill
currently exists in old ops and may be reused inDataprocSubmitJobOperator.on_kill
😉Looks good but let’s remove the
request_id
as I’m still not sure how it worksIn package
airflow.providers.google.cloud.operators.dataproc
each DataprocSubmitJobOperator inherits fromDataprocJobBaseOperator
.DataprocJobBaseOperator
has the following implementation ofon_kill
method (lines 984-992):@turbaszek wrote:
Considering present implementation of
on_kill
method ofDataprocJobBaseOperator
isn’t it already done? The code call’s methodcancel_job
ofDataprocHook
whenon_kill
method is run and propertydataproc_job_id
is present.If I understand this correctly, the only thing that method lacks is:
Thus the following code might do the job:
Did I understand this issue correctly? If not I guess calling hook’s
cancel_job
method is not enough and I will investigate it further.