question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

dumb-init child sometimes receives SIGHUP and/or SIGCONT right after start in setsid mode due to race

See original GitHub issue

OpenStack Kolla[0] project is using dumb-init v1.1.3 before, which works very well. Recently, we upgrade to v1.2.0. But the container failed now and then. After some debug, I found the difference.

here is dumb-init v1.1.3’s debug log[2]

2017-02-07 01:09:52.334863 | + docker logs --tail all nova_compute
2017-02-07 01:09:52.347658 | [dumb-init] Running in debug mode.
2017-02-07 01:09:52.347712 | [dumb-init] Child spawned with PID 7.
2017-02-07 01:09:52.347734 | [dumb-init] setsid complete.
2017-02-07 01:09:52.347790 | INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
2017-02-07 01:09:52.347834 | INFO:__main__:Validating config file
2017-02-07 01:09:52.347884 | INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
2017-02-07 01:09:52.347930 | INFO:__main__:Copying service configuration files
...

here is dumb-init v1.2.0’s debug log[3], which container run successfully.

2017-02-07 01:06:13.845614 | + docker logs --tail all nova_conductor
2017-02-07 01:06:13.854610 | [dumb-init] Running in debug mode.
2017-02-07 01:06:13.854648 | [dumb-init] Child spawned with PID 7.
2017-02-07 01:06:13.854667 | [dumb-init] Received signal 1.
2017-02-07 01:06:13.854687 | [dumb-init] Forwarded signal 1 to children.
2017-02-07 01:06:13.854703 | [dumb-init] Received signal 18.
2017-02-07 01:06:13.854722 | [dumb-init] Forwarded signal 18 to children.
2017-02-07 01:06:13.854737 | [dumb-init] setsid complete.
2017-02-07 01:06:13.854765 | INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
...

No idea why it received a signal 18.

Here is dumb-init v1.2.0’s debug log[4], which container run failed.

2017-02-07 01:06:13.814198 | + docker logs --tail all nova_compute
2017-02-07 01:06:13.823693 | [dumb-init] Running in debug mode.
2017-02-07 01:06:13.823732 | [dumb-init] Child spawned with PID 7.
2017-02-07 01:06:13.823751 | [dumb-init] Received signal 1.
2017-02-07 01:06:13.823766 | [dumb-init] setsid complete.
2017-02-07 01:06:13.823786 | [dumb-init] Forwarded signal 1 to children.
2017-02-07 01:06:13.823802 | [dumb-init] Received signal 18.
2017-02-07 01:06:13.823821 | [dumb-init] Forwarded signal 18 to children.
2017-02-07 01:06:13.823837 | [dumb-init] Received signal 17.
2017-02-07 01:06:13.823860 | [dumb-init] A child with PID 7 was terminated by signal 1.
2017-02-07 01:06:13.823879 | [dumb-init] Forwarded signal 15 to children.
2017-02-07 01:06:13.823900 | [dumb-init] Child exited with status 129. Goodbye.

it received singal 18, then signal 17, and children are killed.

I compared the source code of dumb-init v1.1.3 and v 1.2.0. But can not explain it. Any idea on this?

all these test is based on the same kolla code. the only variable is dumb-init version.

[0] https://github.com/openstack/kolla [1] https://review.openstack.org/#/c/424832/ [2] http://logs.openstack.org/08/429908/3/check/gate-kolla-ansible-dsvm-deploy-ubuntu-source-ubuntu-xenial-nv/260697e/console.html#_2017-02-07_01_09_52_334863 [3] http://logs.openstack.org/81/429681/3/check/gate-kolla-ansible-dsvm-deploy-centos-source-centos-7-nv/d2b0b1b/console.html#_2017-02-07_01_06_13_845614 [4] http://logs.openstack.org/81/429681/3/check/gate-kolla-ansible-dsvm-deploy-centos-source-centos-7-nv/d2b0b1b/console.html#_2017-02-07_01_06_13_814198

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:19 (15 by maintainers)

github_iconTop GitHub Comments

1reaction
chriskuehlcommented, Jun 9, 2018

iirc we do it this way to avoid a race (we want the child to be foregrounded from the start, not some time after the fork).

If we did the above, I think we’d either have a slight race (child could exec the new process before it gets the controlling TTY), or we’d have to coordinate between the child and parent dumb-init processes to delay the exec until the TIOCSCTTY has run in the parent.

1reaction
bukzorcommented, Jun 9, 2018
Read more comments on GitHub >

github_iconTop Results From Across the Web

Introducing dumb-init, an init system for Docker containers
dumb-init basically replaces the shell in that diagram, but forwards signals when it receives them. So when you use `docker signal`, the Python ......
Read more >
Choosing an init process for multi-process containers
Docker's own tutorial for running multiple processes in a container is a good place to start, but not production-ready. So I outsourced my...
Read more >
Use dumb-init for all Docker based analyzers (#118447) - GitLab
When a Docker container is started with a CMD directive, the given command becomes PID 1 in the container and will receive signals,...
Read more >
Introducing dumb-init, an init system ... - Yelp Engineering Blog
In particular, when we signal the docker run command, we want that same signal to be received by the process inside. Our quest...
Read more >
Yelp Releases dumb-init, a Minimal init System for Docker ...
Docker containers are usually used to run a single process which runs with process id (PID) of 1. In Unix and Linux based...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found