question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Intermittent failure on enable docker task

See original GitHub issue

I’ve had several instances where running the setup playbook errors the first time running, and then running it immediately again it completes fine, or at least makes more progress. The step that most often causes the problem on the first run is at this point:

TASK: [docker | enable docker] ************************************************ 
failed: [test-control-03] => {"failed": true}
msg: Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.

failed: [test-worker-001] => {"failed": true}
msg: Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.

failed: [test-control-02] => {"failed": true}
msg: Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.

failed: [test-edge-01] => {"failed": true}
msg: Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.

failed: [test-edge-02] => {"failed": true}
msg: Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.

failed: [test-control-01] => {"failed": true}
msg: Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.


FATAL: all hosts have already failed -- aborting

PLAY RECAP ******************************************************************** 
docker | enable docker ------------------------------------------------- 86.38s
docker | install docker packages --------------------------------------- 46.92s
common | install system utilities -------------------------------------- 21.94s
common | update setuptools and pip ------------------------------------- 19.65s
common | install distributive ------------------------------------------ 15.45s
consul-template | install consul-template ------------------------------- 9.30s
collectd | install collectd packages ------------------------------------ 7.43s
docker | install latest device-mapper-libs ------------------------------ 4.33s
common | enable EPEL repo ----------------------------------------------- 3.77s
common | install pip ---------------------------------------------------- 3.76s

After it errors out, if I check the nodes docker is enabled and running. Then if I re-run the playbook to install mantl it progresses and moves along, most often to completion on the second run.

Related, I’ve noticed an impact on how well the setup runs based on the virtual resources the nodes have. In my case, a 1 CPU, 4 GB RAM setup often errors out multiple times, and in some cases won’t work at all. I’m currently testing with 4CPU/8GB boxes and having no trouble.

Issue Analytics

  • State:closed
  • Created 8 years ago
  • Comments:22 (21 by maintainers)

github_iconTop GitHub Comments

1reaction
langston-barrettcommented, Mar 2, 2016

This continues to affect the master branch of Mantl.

0reactions
Cryptophobiacommented, May 8, 2017

@thomasvincent yes, deploying with the hostNetwork=true fix on a8cdf47 works for me with the latest on master.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Docker intermittently failing to start container - Stack Overflow
I'm using Docker with Jenkins to run Rspec tests, and I'm getting this intermittent error:.
Read more >
Occasional docker build task failure when completing push ...
This appears to occur after push is complete and it it tagging the image in the container registry. It seems to be trying...
Read more >
Intermittent auth failures when pulling docker images - gitlab ...
Since the release of gitlab 15 and gitlab-runner 15.0.0 (febb2a09) we've been getting intermittent auth failures when pulling docker images.
Read more >
Intermittent loss of network connectivity - DockerEngine
When I boot my machine and start docker, then run a container, I can ping 172.17.0.1 from inside the container fine. At some...
Read more >
Troubleshoot self-hosted runner - CircleCI
Troubleshoot container runner · Container fails to start due to disk space · Pod host node runs out of memory · Pod host...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found