question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

stop not working when start was cancelled by concurrency cancelation rule

See original GitHub issue

To ensure that no more than one instance is running on several consequent pushes, the idea is to cancel the previous still running workflows on a new push, so we added a global setting:

concurrency: # cancel previous build on a new push
  group: ${{ github.ref }} # https://docs.github.com/en/actions/reference/context-and-expression-syntax-for-github-actions#github-context
  cancel-in-progress: true

now when I’m testing this, quite often stop fails to stop leaving the instance running, which is very expensive!

Run machulav/ec2-github-runner@v2
Error: Error: Not all the required inputs are provided for the 'stop' mode
Error: Not all the required inputs are provided for the 'stop' mode
Error: TypeError: Cannot read property 'mode' of undefined
Error: Cannot read property 'mode' of undefined

Here is the log from the start job:

 with:
    mode: start
    github-token: ***
    ec2-image-id: ami-03540b272db1624b7
    ec2-instance-type: p3.8xlarge
    security-group-id: sg-f2a4e2fc
    subnet-id: subnet-b7533b96
    aws-resource-tags: [
    {"Key": "Name", "Value": "ec2-github-runner"},
    {"Key": "GitHubRepository", "Value": "bigscience-workshop/Megatron-DeepSpeed"}
  ]
  
  env:
    AWS_DEFAULT_REGION: us-east-1
    AWS_REGION: us-east-1
    AWS_ACCESS_KEY_ID: ***
    AWS_SECRET_ACCESS_KEY: ***
GitHub Registration Token is received
AWS EC2 instance i-038eeed014c994b48 is started
Error: The operation was canceled.

so it looks like it didn’t set the vars it was supposed to set because it was cancelled.

Here is the full workflow for context: https://github.com/bigscience-workshop/Megatron-DeepSpeed/blob/7c636d7555e915f1f426984172f73840b2168313/.github/workflows/main.yml

If there are other solutions I’m all ears.

Thank you!

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:12

github_iconTop GitHub Comments

1reaction
jpalomakicommented, Sep 4, 2021

Would it help if you used a combination of workflow-level concurrency control and job-level concurrency control? Then you could use cancel-in-progress for the test job, which I presume would be safe (since it would only cut the test job short), while still being able to limit concurrent workflow runs to just one as well?

1reaction
jpalomakicommented, Sep 4, 2021

Instead, there should be a sort of handshake protocol / or a failover self-stopping action / dead-man’s switch.

Nice idea 👍 Not sure how complex/brittle the implementation would get, though.

Does github action have a sort of {{ always }} step that gets to run even if the workflow is cancelled? some sort of finalize.

Yep, the stop job is in fact annotated with that (see the if condition), but the problem is that the input to that job is now lacking (because the start job was cut short).

All these suggestions are great for a handful of core devs, but we have contributors with a hugely varying level of expertise and it’s already at times too much to even ask for a PR. So these won’t work well.

Understood. One option could then be to leverage PR comments, e.g. verbatim /test to trigger tests at a suitable spot. You can use the issue_comment event (comparing the comment text) to trigger a workflow when a PR comment is added. Or perhaps there are some ready-made github apps/integrations that could help with this?

where have you found lock?

That was just an example, static concurrency group name.

Ideally we want to prevent concurrent runs within the same PR. But other PRs should be totally independent and allowed to run their CI when needed.

All right, looks like you are in fact using ${{ github.ref }} for the concurrency group already 👍

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to cancel previous runs in the PR when you push new ...
You can use Concurrency: Concurrency ensures that only a single job or workflow using the same concurrency group will run at a time....
Read more >
Cancellation in the PPL | Microsoft Learn
Cancellation does not occur immediately. Although new work is not started if a task or task group is canceled, active work must check...
Read more >
Chapter 9. Cancellation and Timeouts - O'Reilly
In that case, cancel throws the ThreadKilled exception to the thread, so waitCatch will return Left ThreadKilled . Starts the downloads as before....
Read more >
Cancellation and timeouts | Kotlin Documentation
The launch function returns a Job that can be used to cancel the running ... the job job.join() // waits for job's completion...
Read more >
Structured Concurrency (async let) - Cancellation - Swift Forums
This is why, in your first scenario, the computeA task will run until completion without being cancelled, since computeB has not been asked...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found