question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

SageMakerProcessingOperator does not honor action_if_job_exists

See original GitHub issue

Apache Airflow Provider(s)

amazon

Versions of Apache Airflow Providers

apache-airflow-providers-amazon | 2.4.0

Apache Airflow version

2.2.3 (latest released)

Operating System

Amazon Linux 2

Deployment

MWAA

Deployment details

No response

What happened

Sagemaker Processing Operator no longer honors the action_if_job_exists param and always fails creation of a new processing job is a job with the name already exists.

This happens because in a recent change, the function responsible for executing the job no longer honors the increment setting:

Change that breaks the increment: https://github.com/apache/airflow/commit/96dd70348ad7e31cfeae6d21af70671b41551fe9

New code: https://github.com/apache/airflow/blob/6734eb1d09a99dc519e89a59e2086cef09a87098/airflow/providers/amazon/aws/operators/sagemaker.py#L167

What you expected to happen

When Sagemaker Processing operator is called with a job-name that already exists, the job creation should succeed with a name that is incremented by 1.

How to reproduce

invoke SageMakerProcessingOperator twice with the same job name while keeping action_if_job_exists as ‘increment’.

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:2
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

3reactions
o-nikolascommented, Oct 12, 2022

A fairly common recipe is to handle the ThrottlingException in situations like this. So the list_processing_jobs (or more specifically the _list_request helper`) can catch that exception when it’s exhausted the quota, and then sleep for a second or two and then continue another burst of requests.

This way we don’t drop existing functionality and the code remains backwards compatible. We’ve been a bit heavy-handed with deprecations and breaking changes in the Amazon Provider package as of late.

WDYT @vincbeck, @eladkal, @ferruzzi

2reactions
eladkalcommented, Oct 3, 2022

My honest opinion here is we should deprecate the parameter action_if_job_exists and users should handle it themselves and apply whatever strategy they want

I’m all for it. To my prespective it goes beyond what Airflow can/should do. I think this is one of the cases where a blog post showing how to customize strategy is more suitable.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshoot SageMaker Clarify Processing Jobs
If you encounter failures with SageMaker Clarify processing jobs, consult the following scenarios to help identify the issue.
Read more >
Processing — sagemaker 2.124.0 documentation
If not specified, the processor generates a default job name, based on the processing image name and current timestamp. sagemaker_session ( Session )...
Read more >
ProcessingJob operator - Amazon SageMaker - 亚马逊云科技
Processing jobs that are stopped do not incur any charges for SageMaker resources. Use one of the following commands to delete a processing...
Read more >
AWS ML new3 Flashcards | Quizlet
C. Create an IAM role that has permissions to create Amazon SageMaker Processing jobs and to access Amazon ECR. Attach the role to...
Read more >
More about agents rules - Pega Documentation
Agents do not have an associated business calendar. The batch requestor honors the default server locale settings, including the default calendar when ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found