Self hosted runner in Linux container exits job prematurely
See original GitHub issueDescribe the bug We are attempting to test our nodejs application on a self hosted runner in a linux container. Job is exiting (github runner is cancelling the job) prematurely, during a memory intensive portion. The step that we are exiting consistently in, is the setup bash script to install node, node modules, and mongodb.
To Reproduce Steps to reproduce the behavior:
- Setup up container using example Dockerfile
- Start workflow using example workflow (memory intensive work)
- Workflow Exits
Expected behavior To complete without exiting early.
Runner Version and Platform
2.275.1 - Ubuntu Container
What’s not working?
Processes are killed, because oom_score_adj is not able to be edited by github runner.
Job Log Output
Setup Step
Error: The operation was canceled. [debug]System.OperationCanceledException: The operation was canceled.
Runner and Worker’s Diagnostic Logs
Worker Logs
Exiting Exception
Caught cancellation exception from step: System.OperationCanceledException: The operation was canceled. at System.Threading.CancellationToken.ThrowOperationCanceledException() at GitHub.Runner.Sdk.ProcessInvoker.ExecuteAsync(String workingDirectory, String fileName, String arguments, IDictionary
2 environment, Boolean requireExitCodeZero, Encoding outputEncoding, Boolean killProcessOnCancel, Channel1 redirectStandardIn, Boolean inheritConsoleHandler, Boolean keepStandardInOpen, Boolean highPriorityProcess, CancellationToken cancellationToken) at GitHub.Runner.Common.ProcessInvokerWrapper.ExecuteAsync(String workingDirectory, String fileName, String arguments, IDictionary
2 environment, Boolean requireExitCodeZero, Encoding outputEncoding, Boolean killProcessOnCancel, Channel1 redirectStandardIn, Boolean inheritConsoleHandler, Boolean keepStandardInOpen, Boolean highPriorityProcess, CancellationToken cancellationToken) at GitHub.Runner.Worker.Handlers.DefaultStepHost.ExecuteAsync(String workingDirectory, String fileName, String arguments, IDictionary
2 environment, Boolean requireExitCodeZero, Encoding outputEncoding, Boolean killProcessOnCancel, Boolean inheritConsoleHandler, CancellationToken cancellationToken)
at GitHub.Runner.Worker.Handlers.ScriptHandler.RunAsync(ActionRunStage stage)
at GitHub.Runner.Worker.ActionRunner.RunAsync()
at GitHub.Runner.Worker.StepsRunner.RunStepAsync(IStep step, CancellationToken jobCancellationToken)`
A bunch of these errors, but this is the last one (during the bash setup.sh script)
Which: 'bash' [2021-01-19 00:21:54Z INFO ScriptHandler] Location: '/usr/bin/bash' [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] Starting process: [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] File name: '/usr/bin/bash' [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] Arguments: '-e /app/project/_work/_temp/4a28742a-a48e-4b22-9af0-403af0e37076.sh' [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] Working directory: '/app/project/_work/project/project' [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] Require exit code zero: 'False' [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] Encoding web name: ; code page: '' [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] Force kill process on cancellation: 'False' [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] Redirected STDIN: 'False' [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] Persist current code page: 'False' [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] Keep redirected STDIN open: 'False' [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] High priority process: 'False' [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] Failed to update oom_score_adj for PID: 2506. [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] System.UnauthorizedAccessException: Access to the path '/proc/2506/oom_score_adj' is denied. ---> System.IO.IOException: Permission denied --- End of inner exception stack trace --- at System.IO.FileStream.WriteNative(ReadOnlySpan
1 source)
at System.IO.FileStream.FlushWriteBuffer()
at System.IO.FileStream.Dispose(Boolean disposing)
at System.IO.Stream.Close()
at System.IO.StreamWriter.CloseStreamFromDispose(Boolean disposing)
at System.IO.StreamWriter.Dispose(Boolean disposing)
at System.IO.TextWriter.Dispose()
at System.IO.File.WriteAllText(String path, String contents)
at GitHub.Runner.Sdk.ProcessInvoker.WriteProcessOomScoreAdj(Int32 processId, Int32 oomScoreAdj)`
Example Container Dockerfile `FROM ubuntu:latest
ARG DEBIAN_FRONTEND=noninteractive
ENV GITHUB_RUNNER_VERSION=“2.275.1” ENV RUNNER_NAME “runner” ENV GITHUB_OWNER “owner” ENV RUNNER_WORKDIR “_work” ENV GITHUB_REPOSITORY “” ENV GITHUB_PAT “”
RUN apt-get update -y
&& apt-get upgrade -y
&& apt-get install -y
curl
sudo
git
jq
systemctl
tzdata
mysql-client
python3-pip
&& apt-get clean
&& rm -rf /var/lib/apt/lists/*
&& useradd -m github
&& usermod -aG sudo github
&& echo “%sudo ALL=(ALL) NOPASSWD:ALL” >> /etc/sudoers
&& touch /etc/sudoers.d/github
&& echo “github ALL = (ALL) NOPASSWD: ALL” >> /etc/sudoers.d/github
USER github
WORKDIR /app COPY --chown=github:github entrypoint.sh ./entrypoint.sh RUN sudo chown -R github /app RUN sudo chmod u+x ./entrypoint.sh
ENTRYPOINT [ “/app/entrypoint.sh” ]`
Example entrypoint.sh `#!/bin/sh
mkdir -p /app/${GITHUB_REPOSITORY}
cd /app/${GITHUB_REPOSITORY}
curl -Ls https://github.com/actions/runner/releases/download/v${GITHUB_RUNNER_VERSION}/actions-runner-linux-x64-${GITHUB_RUNNER_VERSION}.tar.gz | tar xz
&& sudo ./bin/installdependencies.sh
token_url=“https://api.github.com/repos/${GITHUB_OWNER}/${GITHUB_REPOSITORY}/actions/runners/registration-token” registration_url=“https://github.com/${GITHUB_OWNER}/${GITHUB_REPOSITORY}” echo “Requesting token at ‘${token_url}’”
payload=$(curl -sX POST -H “Authorization: token ${GITHUB_PAT}” ${token_url}) export RUNNER_TOKEN=$(echo “$payload” | jq .token --raw-output)
./config.sh
–name “${GITHUB_REPOSITORY}”
–token “${RUNNER_TOKEN}”
–url “${registration_url}”
–works “${RUNNER_WORKDIR}”
–labels “${RUNNER_LABELS}”
–unattended
–replace
echo “adding actions service” sudo ./svc.sh install
echo "starting all services with actions." sudo systemctl start 'actions.’
tail -f /dev/null `
Example workflow.yml `name: nodejs test
on: push: branches: - dev
jobs: Analysis: runs-on: self-hosted
env:
CI: true
USER: github
strategy:
matrix:
node-version: [ 10.19.0 ]
steps:
- name: Ensure Dependencies
run: |
sudo apt update
sudo apt install -y gcc
sudo apt install -y make
sudo apt install -y lsof
sudo apt install -y curl
sudo apt install -y psmisc
sudo apt install -y node-gyp
- name: Ensure Known Hosts
run: |
mkdir -p ~/.ssh/
touch ~/.ssh/known_hosts
- name: Pull Repo
uses: actions/checkout@v2
- name: Use Node
uses: actions/setup-node@v1
with:
node-version: ${{ matrix.node-version }}
- name: Run Setup.sh
run: |
export CI=true
./scripts/setup.sh
- name: Run Tests
run: sudo ./scripts/run_all_tests.sh
- name: Extra Logs if Failed
if: ${{ failure() }}
run: |
npm -v
node -v
npm config ls -l
cat ~/.npm/_logs/*
`
Notes:
- I have found a similar sounding issue here: https://github.com/microsoft/azure-pipelines-agent/issues/3093
- We have tried multiple container images, and various permissions additions to user ‘github’ including adding it to the ‘root’ group. Also included ‘RUNNER_ALLOW_RUNASROOT’ set.
Issue Analytics
- State:
- Created 3 years ago
- Comments:11 (4 by maintainers)
Top GitHub Comments
@kabamawutschnik try to figure out what process sends the SIGINT?
https://unix.stackexchange.com/a/372581
What’s definitely confusing is that the job ends up marked as cancelled in GitHub Actions, even though runsvc is notified.
Should’ve been failure when a job is not canceled from GitHub Actions, but just killed/terminated on the server, wouldn’t you agree? At lest to me job status is reported from the perspective of GH Actions, not from the underlying runners.