AWS Batch - cancel job has no opportunity to catch jobs for cancellation
See original GitHub issueBased on the implementation, it should be possible to cancel an AWS Batch job before it is STARTING
, i.e.
https://github.com/spulec/moto/blob/master/moto/batch/models.py#L1378-L1382
Contains:
def cancel_job(self, job_id, reason):
job = self.get_job_by_id(job_id)
if job.job_state in ["SUBMITTED", "PENDING", "RUNNABLE"]:
job.terminate(reason)
# No-Op for jobs that have already started - user has to explicitly terminate those
In short, moto 1.x provided jobs with status in that list, while moto 2.x does not.
The actual effective implementation for batch jobs seems to have changed between moto 1.x and 2.x, where the latter seems to create jobs that enter a STARTING
state immediately, or nearly immediately. This is based on experience with moto 1.x and 2.x from tests in https://github.com/dazza-codes/aio-aws
moto 1.x
https://github.com/dazza-codes/aio-aws/blob/main/tests/test_aio_aws_batch.py
It might be too much to ask to go clone that repo/branch and run it, but here’s the summary and some log snippets:
$ pytest -s tests/test_aio_aws_batch.py -k cancel
2021-10-16T21:23:33.290Z | INFO | aio_aws.aio_aws_batch:aio_batch_job_submit:264 | AWS batch-submit-job (sleep-5-job:0f6f1963-e5e0-428c-8b4f-303cf6b968a8) try: 1 of 4
2021-10-16T21:23:33.296Z | INFO | aio_aws.aio_aws_batch:aio_batch_job_cancel:487 | AWS Batch job to cancel: 0f6f1963-e5e0-428c-8b4f-303cf6b968a8, test-job-cancel
2021-10-16T21:23:33.318Z | INFO | aio_aws.aio_aws_batch:aio_batch_job_status:594 | AWS Batch job (sleep-5-job:0f6f1963-e5e0-428c-8b4f-303cf6b968a8) status: PENDING
2021-10-16T21:23:33.690Z | INFO | aio_aws.aio_aws_batch:aio_batch_job_status:594 | AWS Batch job (sleep-5-job:0f6f1963-e5e0-428c-8b4f-303cf6b968a8) status: PENDING
2021-10-16T21:23:34.010Z | INFO | aio_aws.aio_aws_batch:aio_batch_job_status:594 | AWS Batch job (sleep-5-job:0f6f1963-e5e0-428c-8b4f-303cf6b968a8) status: PENDING
2021-10-16T21:23:34.344Z | INFO | aio_aws.aio_aws_batch:aio_batch_job_status:594 | AWS Batch job (sleep-5-job:0f6f1963-e5e0-428c-8b4f-303cf6b968a8) status: RUNNABLE
2021-10-16T21:23:34.702Z | INFO | aio_aws.aio_aws_batch:aio_batch_job_status:594 | AWS Batch job (sleep-5-job:0f6f1963-e5e0-428c-8b4f-303cf6b968a8) status: RUNNABLE
2021-10-16T21:23:35.012Z | INFO | aio_aws.aio_aws_batch:aio_batch_job_status:594 | AWS Batch job (sleep-5-job:0f6f1963-e5e0-428c-8b4f-303cf6b968a8) status: RUNNABLE
2021-10-16T21:23:35.402Z | INFO | aio_aws.aio_aws_batch:aio_batch_job_status:594 | AWS Batch job (sleep-5-job:0f6f1963-e5e0-428c-8b4f-303cf6b968a8) status: STARTING
2021-10-16T21:23:36.004Z | INFO | aio_aws.aio_aws_batch:aio_batch_job_status:594 | AWS Batch job (sleep-5-job:0f6f1963-e5e0-428c-8b4f-303cf6b968a8) status: FAILED
moto 2.x
- https://github.com/dazza-codes/aio-aws/blob/update-moto/tests/test_aio_aws_batch.py
- moto 2.x does not have any PENDING, RUNNABLE status prior to STARTING
versions:
$ poetry show | grep oto
aio-botocore 1.3.3 Async client for aws services using botocore and aiohttp
boto3 1.18.52 The AWS SDK for Python
botocore 1.21.52 Low-level, data-driven core of boto 3.
moto 2.2.8 A library that allows your python tests to easily mock out the boto library
test failures:
$ pytest -s tests/test_aio_aws_batch.py -k cancel
FAILED tests/test_aio_aws_batch.py::test_async_batch_job_cancel - KeyError: 'statusReason'
FAILED tests/test_aio_aws_batch.py::test_batch_jobs_cancel - assert <AWSBatchJobStates.SUCCEEDED: 6> == <AWSBatchJobStates.FAILED: 7>
FAILED tests/test_aio_aws_batch.py::test_async_batch_cancel_jobs - assert <AWSBatchJobStates.SUCCEEDED: 6> == <AWSBatchJobStates.FAILED: 7>
log snippets
2021-10-16T21:12:00.218Z | INFO | aio_aws.aio_aws_batch:aio_batch_job_cancel:487 | AWS Batch job to cancel: eca15aa4-d49a-4ff2-9369-4b5239511f66, test-job-cancel
2021-10-16T21:12:00.258Z | INFO | aio_aws.aio_aws_batch:aio_batch_job_status:594 | AWS Batch job (sleep-5-job:eca15aa4-d49a-4ff2-9369-4b5239511f66) status: STARTING
2021-10-16T21:12:00.527Z | INFO | aio_aws.aio_aws_batch:aio_batch_job_status:594 | AWS Batch job (sleep-5-job:eca15aa4-d49a-4ff2-9369-4b5239511f66) status: STARTING
2021-10-16T21:12:01.015Z | INFO | aio_aws.aio_aws_batch:aio_batch_job_status:594 | AWS Batch job (sleep-5-job:eca15aa4-d49a-4ff2-9369-4b5239511f66) status: RUNNING
2021-10-16T21:12:01.472Z | INFO | aio_aws.aio_aws_batch:aio_batch_job_status:594 | AWS Batch job (sleep-5-job:eca15aa4-d49a-4ff2-9369-4b5239511f66) status: RUNNING
2021-10-16T21:12:01.749Z | INFO | aio_aws.aio_aws_batch:aio_batch_job_status:594 | AWS Batch job (sleep-5-job:eca15aa4-d49a-4ff2-9369-4b5239511f66) status: RUNNING
2021-10-16T21:12:02.364Z | INFO | aio_aws.aio_aws_batch:aio_batch_job_status:594 | AWS Batch job (sleep-5-job:eca15aa4-d49a-4ff2-9369-4b5239511f66) status: RUNNING
2021-10-16T21:12:02.640Z | INFO | aio_aws.aio_aws_batch:aio_batch_job_status:594 | AWS Batch job (sleep-5-job:eca15aa4-d49a-4ff2-9369-4b5239511f66) status: RUNNING
2021-10-16T21:12:03.052Z | INFO | aio_aws.aio_aws_batch:aio_batch_job_status:594 | AWS Batch job (sleep-5-job:eca15aa4-d49a-4ff2-9369-4b5239511f66) status: RUNNING
2021-10-16T21:12:03.416Z | INFO | aio_aws.aio_aws_batch:aio_batch_job_status:594 | AWS Batch job (sleep-5-job:eca15aa4-d49a-4ff2-9369-4b5239511f66) status: RUNNING
2021-10-16T21:12:03.920Z | INFO | aio_aws.aio_aws_batch:aio_batch_job_status:594 | AWS Batch job (sleep-5-job:eca15aa4-d49a-4ff2-9369-4b5239511f66) status: RUNNING
2021-10-16T21:12:04.466Z | INFO | aio_aws.aio_aws_batch:aio_batch_job_status:594 | AWS Batch job (sleep-5-job:eca15aa4-d49a-4ff2-9369-4b5239511f66) status: RUNNING
2021-10-16T21:12:04.990Z | INFO | aio_aws.aio_aws_batch:aio_batch_job_status:594 | AWS Batch job (sleep-5-job:eca15aa4-d49a-4ff2-9369-4b5239511f66) status: RUNNING
2021-10-16T21:12:05.455Z | INFO | aio_aws.aio_aws_batch:aio_batch_job_status:594 | AWS Batch job (sleep-5-job:eca15aa4-d49a-4ff2-9369-4b5239511f66) status: RUNNING
2021-10-16T21:12:05.971Z | INFO | aio_aws.aio_aws_batch:aio_batch_job_status:594 | AWS Batch job (sleep-5-job:eca15aa4-d49a-4ff2-9369-4b5239511f66) status: RUNNING
2021-10-16T21:12:06.306Z | INFO | aio_aws.aio_aws_batch:aio_batch_job_status:594 | AWS Batch job (sleep-5-job:eca15aa4-d49a-4ff2-9369-4b5239511f66) status: SUCCEEDED
Issue Analytics
- State:
- Created 2 years ago
- Comments:5
The use of an initial job that jobs depend on helped to work around the problem, i.e.
Moto 3.1.8 now contains a state manager, that would allow you to artificially slow down how fast Moto moves through the individual states. This makes it possible to get back to the Moto 1.x behaviour, where we had
sleep
-statements in betweensubmitted
/pending
/runnable
, except that the delay is now configurable.Note that the default behaviour is still to cycle through states as quickly as possible.
A test for this exact scenario, where we want to cancel a Batch-job before it starts, can be found here: https://github.com/spulec/moto/blob/master/tests/test_moto_api/state_manager/test_batch_integration.py
The general documentation can be found here: http://docs.getmoto.org/en/latest/docs/configuration/state_transition/index.html
I believe that this solves the problem outlined, so I’ll close this. Let us know if you have any questions around this though.