Renew job request fails with Forbidden
See original GitHub issueDescribe the bug A clear and concise description of what the bug is.
Our runner daemons for pytorch/pytorch
started exhibiting failures to renew job requests with the following error messages:
EVENTS 1639091118591 [2021-12-09 23:05:13Z ERR MessageListener] GitHub.DistributedTask.WebApi.AccessDeniedException: Access denied. System:ServiceIdentity;DDDDDDDD-DDDD-DDDD-DDDD-DDDDDDDDDDDD needs View permissions to perform the action.
These error messages eventually lead to the job failing to run and then our auto-scaler to scale-down the node because it doesn’t think it’s actually running a job due to the job renewal request failing (I’m assuming)
To Reproduce Not entirely sure how to reproduce but this has been happening consistently to our self hosted runners since 12/09 10am PDT (tracking our experience here: https://github.com/pytorch/pytorch/issues/69722)
Expected behavior Runner is able to renew job request successfully. This did work pretty flawlessly for us until this morning
Runner Version and Platform
Version of your runner?
2.285.1
OS of the machine running the runner? OSX/Windows/Linux/…
linux
What’s not working?
Please include error messages and screenshots.
Job Log Output
If applicable, include the relevant part of the job / step log output here. All sensitive information should already be masked out, but please double-check before pasting here.
Runner and Worker’s Diagnostic Logs
If applicable, add relevant diagnostic log information. Logs are located in the runner’s _diag
folder. The runner logs are prefixed with Runner_
and the worker logs are prefixed with Worker_
. Each job run correlates to a worker log. All sensitive information should already be masked out, but please double-check before pasting here.
Full Logs
EVENTS 1639091118591 [2021-12-09 23:05:13Z ERR GitHubActionsService] GET request to https://pipelines.actions.githubusercontent.com/mBh68xKhi8LyM7tp3vECvYXNFvuV4gyVGgmYCteuEZP9JH92QN/_apis/distributedtask/pools/1/messages?sessionId=cb105875-b2f6-4d26-afe3-451f3b5536eb&lastMessageId=1 failed. HTTP Status: Forbidden, AFD Ref: Ref A: 2B6CBF4D515344E4B2ED24E7AE2F916A Ref B: ASHEDGE1219 Ref C: 2021-12-09T23:05:13Z 1639091113587
EVENTS 1639091118591 [2021-12-09 23:05:13Z ERR MessageListener] Catch exception during get next message. 1639091113587
EVENTS 1639091118591 [2021-12-09 23:05:13Z ERR MessageListener] GitHub.DistributedTask.WebApi.AccessDeniedException: Access denied. System:ServiceIdentity;DDDDDDDD-DDDD-DDDD-DDDD-DDDDDDDDDDDD needs View permissions to perform the action.
at GitHub.Services.WebApi.VssHttpClientBase.HandleResponseAsync(HttpResponseMessage response, CancellationToken cancellationToken)
at GitHub.Services.WebApi.VssHttpClientBase.SendAsync(HttpRequestMessage message, HttpCompletionOption completionOption, Object userState, CancellationToken cancellationToken)
at GitHub.Services.WebApi.VssHttpClientBase.SendAsync[T](HttpRequestMessage message, Object userState, CancellationToken cancellationToken)
at GitHub.Services.WebApi.VssHttpClientBase.SendAsync[T](HttpMethod method, IEnumerable`1 additionalHeaders, Guid locationId, Object routeValues, ApiResourceVersion version, HttpContent content, IEnumerable`1 queryParameters, Object userState, CancellationToken cancellationToken)
at GitHub.Runner.Listener.MessageListener.GetNextMessageAsync(CancellationToken token) 1639091113587
EVENTS 1639091118591 [2021-12-09 23:05:13Z INFO MessageListener] Non-retriable exception: Access denied. System:ServiceIdentity;DDDDDDDD-DDDD-DDDD-DDDD-DDDDDDDDDDDD needs View permissions to perform the action. 1639091113587
EVENTS 1639091118591 [2021-12-09 23:05:13Z INFO JobDispatcher] Shutting down JobDispatcher. Make sure all WorkerDispatcher has finished. 1639091113587
EVENTS 1639091118591 [2021-12-09 23:05:13Z INFO JobDispatcher] Ensure WorkerDispather for job be3eeb22-584a-5ee4-236b-1a042e19f0eb run to finish, cancel any running job. 1639091113587
EVENTS 1639091118591 [2021-12-09 23:05:13Z INFO JobDispatcher] Send job cancellation message to worker for job be3eeb22-584a-5ee4-236b-1a042e19f0eb. 1639091113587
EVENTS 1639091118591 [2021-12-09 23:05:13Z INFO ProcessChannel] Sending message of length 0, with hash 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855' 1639091114087
EVENTS 1639091118591 [2021-12-09 23:05:13Z ERR GitHubActionsService] PATCH request to https://pipelines.actions.githubusercontent.com/mBh68xKhi8LyM7tp3vECvYXNFvuV4gyVGgmYCteuEZP9JH92QN/_apis/distributedtask/pools/1/jobrequests/1191300?lockToken=00000000-0000-0000-0000-000000000000 failed. HTTP Status: Forbidden, AFD Ref: Ref A: 217FF6F951F749368831E29FE9D36628 Ref B: ASHEDGE1518 Ref C: 2021-12-09T23:05:13Z 1639091114087
EVENTS 1639091118591 [2021-12-09 23:05:13Z ERR JobDispatcher] Catch exception during renew runner jobrequest 1191300. 1639091114087
EVENTS 1639091118591 [2021-12-09 23:05:13Z ERR JobDispatcher] GitHub.DistributedTask.WebApi.AccessDeniedException: Access denied. System:ServiceIdentity;DDDDDDDD-DDDD-DDDD-DDDD-DDDDDDDDDDDD needs View permissions to perform the action.
at GitHub.Services.WebApi.VssHttpClientBase.HandleResponseAsync(HttpResponseMessage response, CancellationToken cancellationToken)
at GitHub.Services.WebApi.VssHttpClientBase.SendAsync(HttpRequestMessage message, HttpCompletionOption completionOption, Object userState, CancellationToken cancellationToken)
at GitHub.DistributedTask.WebApi.TaskAgentHttpClient.SendAsync[T](HttpRequestMessage message, Object userState, CancellationToken cancellationToken, Func`3 processResponse)
at GitHub.DistributedTask.WebApi.TaskAgentHttpClient.SendAsync[T](HttpMethod method, IEnumerable`1 additionalHeaders, Guid locationId, Object routeValues, ApiResourceVersion version, HttpContent content, IEnumerable`1 queryParameters, Object userState, CancellationToken cancellationToken, Func`3 processResponse)
at GitHub.Runner.Listener.JobDispatcher.RenewJobRequestAsync(Int32 poolId, Int64 requestId, Guid lockToken, String orchestrationId, TaskCompletionSource`1 firstJobRequestRenewed, CancellationToken token) 1639091114088
EVENTS 1639091118591 [2021-12-09 23:05:13Z INFO JobDispatcher] Retrying lock renewal for jobrequest 1191300. Job is valid until 12/09/2021 23:14:13. 1639091114088
EVENTS 1639091118591 [2021-12-09 23:05:13Z INFO RunnerServer] Refresh JobRequest VssConnection to get on a different AFD node. 1639091114088
EVENTS 1639091118591 [2021-12-09 23:05:13Z INFO RunnerServer] Establish connection with 30 seconds timeout. 1639091114088
EVENTS 1639091118591 [2021-12-09 23:05:14Z INFO GitHubActionsService] Starting operation Location.GetConnectionData 1639091114088
EVENTS 1639091118591 [2021-12-09 23:05:14Z INFO GitHubActionsService] Finished operation Location.GetConnectionData 1639091118163
EVENTS 1639091126870 [2021-12-09 23:05:21Z ERR GitHubActionsService] PATCH request to https://pipelines.actions.githubusercontent.com/mBh68xKhi8LyM7tp3vECvYXNFvuV4gyVGgmYCteuEZP9JH92QN/_apis/distributedtask/pools/1/jobrequests/1191300?lockToken=00000000-0000-0000-0000-000000000000 failed. HTTP Status: Forbidden, AFD Ref: Ref A: 38BD77BD9CF14D0787059796D06BC031 Ref B: ASHEDGE1206 Ref C: 2021-12-09T23:05:21Z 1639091121842
EVENTS 1639091126870 [2021-12-09 23:05:21Z ERR JobDispatcher] Catch exception during renew runner jobrequest 1191300. 1639091121842
EVENTS 1639091126870 [2021-12-09 23:05:21Z ERR JobDispatcher] GitHub.DistributedTask.WebApi.AccessDeniedException: Access denied. System:ServiceIdentity;DDDDDDDD-DDDD-DDDD-DDDD-DDDDDDDDDDDD needs View permissions to perform the action.
at GitHub.Services.WebApi.VssHttpClientBase.HandleResponseAsync(HttpResponseMessage response, CancellationToken cancellationToken)
at GitHub.Services.WebApi.VssHttpClientBase.SendAsync(HttpRequestMessage message, HttpCompletionOption completionOption, Object userState, CancellationToken cancellationToken)
at GitHub.DistributedTask.WebApi.TaskAgentHttpClient.SendAsync[T](HttpRequestMessage message, Object userState, CancellationToken cancellationToken, Func`3 processResponse)
at GitHub.DistributedTask.WebApi.TaskAgentHttpClient.SendAsync[T](HttpMethod method, IEnumerable`1 additionalHeaders, Guid locationId, Object routeValues, ApiResourceVersion version, HttpContent content, IEnumerable`1 queryParameters, Object userState, CancellationToken cancellationToken, Func`3 processResponse)
at GitHub.Runner.Listener.JobDispatcher.RenewJobRequestAsync(Int32 poolId, Int64 requestId, Guid lockToken, String orchestrationId, TaskCompletionSource`1 firstJobRequestRenewed, CancellationToken token) 1639091121842
EVENTS 1639091126870 [2021-12-09 23:05:21Z INFO JobDispatcher] Retrying lock renewal for jobrequest 1191300. Job is valid until 12/09/2021 23:14:13. 1639091121842
EVENTS 1639091126870 [2021-12-09 23:05:21Z INFO RunnerServer] Refresh JobRequest VssConnection to get on a different AFD node. 1639091121842
EVENTS 1639091126870 [2021-12-09 23:05:21Z INFO RunnerServer] Establish connection with 30 seconds timeout. 1639091121842
EVENTS 1639091126870 [2021-12-09 23:05:21Z INFO GitHubActionsService] Starting operation Location.GetConnectionData 1639091121842
EVENTS 1639091126870 [2021-12-09 23:05:21Z INFO GitHubActionsService] Finished operation Location.GetConnectionData 1639091126163
EVENTS 1639091137351 [2021-12-09 23:05:32Z ERR GitHubActionsService] PATCH request to https://pipelines.actions.githubusercontent.com/mBh68xKhi8LyM7tp3vECvYXNFvuV4gyVGgmYCteuEZP9JH92QN/_apis/distributedtask/pools/1/jobrequests/1191300?lockToken=00000000-0000-0000-0000-000000000000 failed. HTTP Status: Forbidden, AFD Ref: Ref A: 71753697DAF54A4DBCCBDD8C92EC2816 Ref B: ASHEDGE1515 Ref C: 2021-12-09T23:05:32Z 1639091132347
EVENTS 1639091137351 [2021-12-09 23:05:32Z ERR JobDispatcher] Catch exception during renew runner jobrequest 1191300. 1639091132347
EVENTS 1639091137351 [2021-12-09 23:05:32Z ERR JobDispatcher] GitHub.DistributedTask.WebApi.AccessDeniedException: Access denied. System:ServiceIdentity;DDDDDDDD-DDDD-DDDD-DDDD-DDDDDDDDDDDD needs View permissions to perform the action.
at GitHub.Services.WebApi.VssHttpClientBase.HandleResponseAsync(HttpResponseMessage response, CancellationToken cancellationToken)
at GitHub.Services.WebApi.VssHttpClientBase.SendAsync(HttpRequestMessage message, HttpCompletionOption completionOption, Object userState, CancellationToken cancellationToken)
at GitHub.DistributedTask.WebApi.TaskAgentHttpClient.SendAsync[T](HttpRequestMessage message, Object userState, CancellationToken cancellationToken, Func`3 processResponse)
at GitHub.DistributedTask.WebApi.TaskAgentHttpClient.SendAsync[T](HttpMethod method, IEnumerable`1 additionalHeaders, Guid locationId, Object routeValues, ApiResourceVersion version, HttpContent content, IEnumerable`1 queryParameters, Object userState, CancellationToken cancellationToken, Func`3 processResponse)
at GitHub.Runner.Listener.JobDispatcher.RenewJobRequestAsync(Int32 poolId, Int64 requestId, Guid lockToken, String orchestrationId, TaskCompletionSource`1 firstJobRequestRenewed, CancellationToken token) 1639091132347
Issue Analytics
- State:
- Created 2 years ago
- Reactions:6
- Comments:10 (1 by maintainers)
Top GitHub Comments
We are aware of this issue and working on a fix.
Thank you for your patience.
We will follow up with more information in this thread once it is available.
This issue has been fully mitigated. We apologize for the disruption this has caused. There was a code change that impacted self hosted jobs that were awaiting assignment longer than 15 minutes. Note that you can also follow along at the twitter account for more real time status than this repo: https://twitter.com/githubstatus