Unexpected stop of container build [CML, AWS, github]
See original GitHub issueMy scenario: I’m trying to create appropriate pipeline for my ML project. I’m using the following CML yaml file:
on:
# Trigger the workflow on push or pull request
push:
branches:
- mybranch
jobs:
deploy-runner:
runs-on: [ubuntu-latest]
steps:
- uses: iterative/setup-cml@v1
- uses: actions/checkout@v2
- name: Deploy runner on EC2
env:
PERSONAL_ACCESS_TOKEN: ${{ secrets.REPO_TOKEN }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-west-1
run: |
cml-runner \
--repo https://github.com/MyCompany/myrepo \
--token=$PERSONAL_ACCESS_TOKEN \
--cloud aws \
--cloud-region us-west-1 \
--cloud-type=g3.4xlarge \
--labels=cml-runner \
--idle-timeout 30
model-training:
needs: [deploy-runner]
runs-on: [self-hosted, cml-runner]
container:
image: docker://dvcorg/cml:0-dvc1-base1-gpu
options: --gpus all
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: '3.6'
- name: Train model
env:
repo_token: ${{ secrets.REPO_TOKEN }}
AWS_ACCESS_KEY_ID: ${{ secrets.DVC_ACCESS_KEY }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.DVC_SECRET_KEY }}
REQUIREMENTS_FILE: 'training/training_req.txt'
run: |
export AWS_DEFAULT_REGION=us-east-1
echo "Install reqs"
sudo apt update
sudo apt-get install default-jre scala
pip install py4j
pip install --no-cache-dir -e .
export PYSPARK_PYTHON=python3
echo "Start CML"
python3 -m spacy download en_core_web_sm
echo "Pull data"
dvc repro
echo "## Model metrics" > report.md
cat prepare_data/metrics.txt >> report.md
cml-send-comment report.md
As you can notice I used the following image docker://dvcorg/cml:0-dvc1-base1-gpu, but I started receive the following error message:
I can see that container started to build but unexpectedly stopped, and I do not see the reason of this behavior. Actually i did not change anything in my script, and it just stopped work, but earlier I run it successfully.
Thanks!
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:9 (6 by maintainers)
Top Results From Across the Web
failed to solve with frontend dockerfile.v0: failed to build LLB ...
Stop the containers using the docker image; Remove the volumes used by the containers. Rebuild the image. Alternatively, tag the build image ...
Read more >Docker fails to start containers with cgroup memory allocation ...
This issue has been fixed in the past by restarting the docker daemon or rebooting the machine although the docker daemon is active...
Read more >Error when trying to run docker-compose up. "oci runtime error ...
When trying to launch a built container with docker-compose up I'm getting an error: ERROR: for app Cannot start service app: invalid header ......
Read more >Amazon ECS Container Agent - GitHub
Environment Key Example Value(s) Description
ECS_CLUSTER clusterName The cluster this agent should check into.
AWS_ACCESS_KEY_ID AKIDEXAMPLE The access key used by the agent for all...
AWS_SECRET_ACCESS_KEY...
Read more >ECS continues to stop and restart new container with exit code 0
2018-02-16T07:24:01Z [INFO] Managed task [arn:aws:ecs:us-west-2:035804961478:task/c13ba3f3-6ac8-49c5-a649-3d90e363ce4d]: Cgroup resource set up for task ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
fixed! Seems that the error that appeared later on was a hicup. I have tried multiple times successfully
Thanks! I was able to run it as well.