question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Self hosted runner in Linux container exits job prematurely

See original GitHub issue

Describe the bug We are attempting to test our nodejs application on a self hosted runner in a linux container. Job is exiting (github runner is cancelling the job) prematurely, during a memory intensive portion. The step that we are exiting consistently in, is the setup bash script to install node, node modules, and mongodb.

To Reproduce Steps to reproduce the behavior:

  1. Setup up container using example Dockerfile
  2. Start workflow using example workflow (memory intensive work)
  3. Workflow Exits

Expected behavior To complete without exiting early.

Runner Version and Platform

2.275.1 - Ubuntu Container

What’s not working?

Processes are killed, because oom_score_adj is not able to be edited by github runner.

Job Log Output

Setup Step Error: The operation was canceled. [debug]System.OperationCanceledException: The operation was canceled.

Runner and Worker’s Diagnostic Logs

Worker Logs

Exiting Exception Caught cancellation exception from step: System.OperationCanceledException: The operation was canceled. at System.Threading.CancellationToken.ThrowOperationCanceledException() at GitHub.Runner.Sdk.ProcessInvoker.ExecuteAsync(String workingDirectory, String fileName, String arguments, IDictionary2 environment, Boolean requireExitCodeZero, Encoding outputEncoding, Boolean killProcessOnCancel, Channel1 redirectStandardIn, Boolean inheritConsoleHandler, Boolean keepStandardInOpen, Boolean highPriorityProcess, CancellationToken cancellationToken) at GitHub.Runner.Common.ProcessInvokerWrapper.ExecuteAsync(String workingDirectory, String fileName, String arguments, IDictionary2 environment, Boolean requireExitCodeZero, Encoding outputEncoding, Boolean killProcessOnCancel, Channel1 redirectStandardIn, Boolean inheritConsoleHandler, Boolean keepStandardInOpen, Boolean highPriorityProcess, CancellationToken cancellationToken) at GitHub.Runner.Worker.Handlers.DefaultStepHost.ExecuteAsync(String workingDirectory, String fileName, String arguments, IDictionary2 environment, Boolean requireExitCodeZero, Encoding outputEncoding, Boolean killProcessOnCancel, Boolean inheritConsoleHandler, CancellationToken cancellationToken) at GitHub.Runner.Worker.Handlers.ScriptHandler.RunAsync(ActionRunStage stage) at GitHub.Runner.Worker.ActionRunner.RunAsync() at GitHub.Runner.Worker.StepsRunner.RunStepAsync(IStep step, CancellationToken jobCancellationToken)`

A bunch of these errors, but this is the last one (during the bash setup.sh script) Which: 'bash' [2021-01-19 00:21:54Z INFO ScriptHandler] Location: '/usr/bin/bash' [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] Starting process: [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] File name: '/usr/bin/bash' [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] Arguments: '-e /app/project/_work/_temp/4a28742a-a48e-4b22-9af0-403af0e37076.sh' [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] Working directory: '/app/project/_work/project/project' [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] Require exit code zero: 'False' [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] Encoding web name: ; code page: '' [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] Force kill process on cancellation: 'False' [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] Redirected STDIN: 'False' [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] Persist current code page: 'False' [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] Keep redirected STDIN open: 'False' [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] High priority process: 'False' [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] Failed to update oom_score_adj for PID: 2506. [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] System.UnauthorizedAccessException: Access to the path '/proc/2506/oom_score_adj' is denied. ---> System.IO.IOException: Permission denied --- End of inner exception stack trace --- at System.IO.FileStream.WriteNative(ReadOnlySpan1 source) at System.IO.FileStream.FlushWriteBuffer() at System.IO.FileStream.Dispose(Boolean disposing) at System.IO.Stream.Close() at System.IO.StreamWriter.CloseStreamFromDispose(Boolean disposing) at System.IO.StreamWriter.Dispose(Boolean disposing) at System.IO.TextWriter.Dispose() at System.IO.File.WriteAllText(String path, String contents) at GitHub.Runner.Sdk.ProcessInvoker.WriteProcessOomScoreAdj(Int32 processId, Int32 oomScoreAdj)`

Example Container Dockerfile `FROM ubuntu:latest

ARG DEBIAN_FRONTEND=noninteractive

ENV GITHUB_RUNNER_VERSION=“2.275.1” ENV RUNNER_NAME “runner” ENV GITHUB_OWNER “owner” ENV RUNNER_WORKDIR “_work” ENV GITHUB_REPOSITORY “” ENV GITHUB_PAT “”

RUN apt-get update -y
&& apt-get upgrade -y
&& apt-get install -y
curl
sudo
git
jq
systemctl
tzdata
mysql-client
python3-pip
&& apt-get clean
&& rm -rf /var/lib/apt/lists/*
&& useradd -m github
&& usermod -aG sudo github
&& echo “%sudo ALL=(ALL) NOPASSWD:ALL” >> /etc/sudoers
&& touch /etc/sudoers.d/github
&& echo “github ALL = (ALL) NOPASSWD: ALL” >> /etc/sudoers.d/github

USER github

WORKDIR /app COPY --chown=github:github entrypoint.sh ./entrypoint.sh RUN sudo chown -R github /app RUN sudo chmod u+x ./entrypoint.sh

ENTRYPOINT [ “/app/entrypoint.sh” ]`

Example entrypoint.sh `#!/bin/sh

mkdir -p /app/${GITHUB_REPOSITORY} cd /app/${GITHUB_REPOSITORY} curl -Ls https://github.com/actions/runner/releases/download/v${GITHUB_RUNNER_VERSION}/actions-runner-linux-x64-${GITHUB_RUNNER_VERSION}.tar.gz | tar xz
&& sudo ./bin/installdependencies.sh

token_url=“https://api.github.com/repos/${GITHUB_OWNER}/${GITHUB_REPOSITORY}/actions/runners/registration-token” registration_url=“https://github.com/${GITHUB_OWNER}/${GITHUB_REPOSITORY}” echo “Requesting token at ‘${token_url}’”

payload=$(curl -sX POST -H “Authorization: token ${GITHUB_PAT}” ${token_url}) export RUNNER_TOKEN=$(echo “$payload” | jq .token --raw-output)

./config.sh
–name “${GITHUB_REPOSITORY}”
–token “${RUNNER_TOKEN}”
–url “${registration_url}”
–works “${RUNNER_WORKDIR}”
–labels “${RUNNER_LABELS}”
–unattended
–replace

echo “adding actions service” sudo ./svc.sh install

echo "starting all services with actions." sudo systemctl start 'actions.

tail -f /dev/null `

Example workflow.yml `name: nodejs test

on: push: branches: - dev

jobs: Analysis: runs-on: self-hosted

env:
  CI: true
  USER: github

strategy:
  matrix:
    node-version: [ 10.19.0 ]
steps:
  - name: Ensure Dependencies
    run: |
      sudo apt update
      sudo apt install -y gcc
      sudo apt install -y make
      sudo apt install -y lsof
      sudo apt install -y curl
      sudo apt install -y psmisc
      sudo apt install -y node-gyp
  - name: Ensure Known Hosts
    run: |
      mkdir -p ~/.ssh/
      touch ~/.ssh/known_hosts
  - name: Pull Repo
    uses: actions/checkout@v2
  - name: Use Node
    uses: actions/setup-node@v1
    with:
      node-version: ${{ matrix.node-version }}
  - name: Run Setup.sh
    run: |
      export CI=true
      ./scripts/setup.sh
  - name: Run Tests
    run: sudo ./scripts/run_all_tests.sh
  - name: Extra Logs if Failed
    if: ${{ failure() }}
    run: |
      npm -v
      node -v
      npm config ls -l
      cat ~/.npm/_logs/*

`

Notes:

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:11 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
TingluoHuangcommented, Jan 20, 2021

@kabamawutschnik try to figure out what process sends the SIGINT?

https://unix.stackexchange.com/a/372581

0reactions
MartinNowakcommented, Apr 16, 2021

What’s definitely confusing is that the job ends up marked as cancelled in GitHub Actions, even though runsvc is notified.

Apr 16 13:04:04 gh-runner-03 systemd[1]: gh-actions-runner.service: A process of this unit has been killed by the OOM killer.
Apr 16 13:04:04 gh-runner-03 runsvc.sh[305080]: Shutting down runner listener
Apr 16 13:04:04 gh-runner-03 runsvc.sh[305080]: Sending SIGINT to runner listener to stop
Apr 16 13:04:04 gh-runner-03 runsvc.sh[305080]: Exiting...
Apr 16 13:04:08 gh-runner-03 runsvc.sh[305080]: 2021-04-16 11:04:08Z: Job run completed with result: Canceled
Apr 16 13:04:08 gh-runner-03 runsvc.sh[305080]: Runner listener exited with error code 0
Apr 16 13:04:08 gh-runner-03 runsvc.sh[305080]: Runner listener exit with 0 return code, stop the service, no retry needed.
Apr 16 13:04:08 gh-runner-03 systemd[1]: gh-actions-runner.service: Failed with result 'oom-kill'.
Apr 16 13:04:08 gh-runner-03 systemd[1]: gh-actions-runner.service: Consumed 4h 54min 39.841s CPU time.
Apr 16 13:04:13 gh-runner-03 systemd[1]: gh-actions-runner.service: Scheduled restart job, restart counter is at 5.
Apr 16 13:04:13 gh-runner-03 systemd[1]: Stopped GitHub Actions Runner.

Job run completed with result: Canceled

Should’ve been failure when a job is not canceled from GitHub Actions, but just killed/terminated on the server, wouldn’t you agree? At lest to me job status is reported from the perspective of GH Actions, not from the underlying runners.

Read more comments on GitHub >

github_iconTop Results From Across the Web

docker - Github action killed (137), container exits with 0
I've setup a github workflow to run my docker compose headless (node & postgres container) then run my jest tests.
Read more >
Job marked as success when job terminates midway in ...
It is expected that the job is marked as failed if the builder exits ... self-hosted or on GKE, though it has been...
Read more >
Deploying Self-Hosted GitHub Actions Runners with Docker
This tutorial looks at how to deploy self-hosted GitHub Actions runners with Docker and Docker Swarm to DigitalOcean.
Read more >
How to Fix and Debug Docker Containers Like a Superhero
Your container exits when the commands contained within are invalid — just like we saw earlier. You'll be able to see if you've...
Read more >
GitHub Actions limitations and gotchas
Using autoscaling self hosted runners, it is not currently possible to instruct the agent to finalise the current job but to accept none ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found