SageMakerProcessingOperator does not honor action_if_job_exists
See original GitHub issueApache Airflow Provider(s)
amazon
Versions of Apache Airflow Providers
apache-airflow-providers-amazon | 2.4.0
Apache Airflow version
2.2.3 (latest released)
Operating System
Amazon Linux 2
Deployment
MWAA
Deployment details
No response
What happened
Sagemaker Processing Operator no longer honors the action_if_job_exists
param and always fails creation of a new processing job is a job with the name already exists.
This happens because in a recent change, the function responsible for executing the job no longer honors the increment setting:
Change that breaks the increment: https://github.com/apache/airflow/commit/96dd70348ad7e31cfeae6d21af70671b41551fe9
What you expected to happen
When Sagemaker Processing operator is called with a job-name that already exists, the job creation should succeed with a name that is incremented by 1.
How to reproduce
invoke SageMakerProcessingOperator twice with the same job name while keeping action_if_job_exists as ‘increment’.
Anything else
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project’s Code of Conduct
Issue Analytics
- State:
- Created 2 years ago
- Reactions:2
- Comments:8 (4 by maintainers)
A fairly common recipe is to handle the
ThrottlingException
in situations like this. So thelist_processing_jobs
(or more specifically the_list_request
helper`) can catch that exception when it’s exhausted the quota, and then sleep for a second or two and then continue another burst of requests.This way we don’t drop existing functionality and the code remains backwards compatible. We’ve been a bit heavy-handed with deprecations and breaking changes in the Amazon Provider package as of late.
WDYT @vincbeck, @eladkal, @ferruzzi
I’m all for it. To my prespective it goes beyond what Airflow can/should do. I think this is one of the cases where a blog post showing how to customize strategy is more suitable.