dumb-init child sometimes receives SIGHUP and/or SIGCONT right after start in setsid mode due to race
See original GitHub issueOpenStack Kolla[0] project is using dumb-init v1.1.3 before, which works very well. Recently, we upgrade to v1.2.0. But the container failed now and then. After some debug, I found the difference.
here is dumb-init v1.1.3’s debug log[2]
2017-02-07 01:09:52.334863 | + docker logs --tail all nova_compute
2017-02-07 01:09:52.347658 | [dumb-init] Running in debug mode.
2017-02-07 01:09:52.347712 | [dumb-init] Child spawned with PID 7.
2017-02-07 01:09:52.347734 | [dumb-init] setsid complete.
2017-02-07 01:09:52.347790 | INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
2017-02-07 01:09:52.347834 | INFO:__main__:Validating config file
2017-02-07 01:09:52.347884 | INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
2017-02-07 01:09:52.347930 | INFO:__main__:Copying service configuration files
...
here is dumb-init v1.2.0’s debug log[3], which container run successfully.
2017-02-07 01:06:13.845614 | + docker logs --tail all nova_conductor
2017-02-07 01:06:13.854610 | [dumb-init] Running in debug mode.
2017-02-07 01:06:13.854648 | [dumb-init] Child spawned with PID 7.
2017-02-07 01:06:13.854667 | [dumb-init] Received signal 1.
2017-02-07 01:06:13.854687 | [dumb-init] Forwarded signal 1 to children.
2017-02-07 01:06:13.854703 | [dumb-init] Received signal 18.
2017-02-07 01:06:13.854722 | [dumb-init] Forwarded signal 18 to children.
2017-02-07 01:06:13.854737 | [dumb-init] setsid complete.
2017-02-07 01:06:13.854765 | INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
...
No idea why it received a signal 18.
Here is dumb-init v1.2.0’s debug log[4], which container run failed.
2017-02-07 01:06:13.814198 | + docker logs --tail all nova_compute
2017-02-07 01:06:13.823693 | [dumb-init] Running in debug mode.
2017-02-07 01:06:13.823732 | [dumb-init] Child spawned with PID 7.
2017-02-07 01:06:13.823751 | [dumb-init] Received signal 1.
2017-02-07 01:06:13.823766 | [dumb-init] setsid complete.
2017-02-07 01:06:13.823786 | [dumb-init] Forwarded signal 1 to children.
2017-02-07 01:06:13.823802 | [dumb-init] Received signal 18.
2017-02-07 01:06:13.823821 | [dumb-init] Forwarded signal 18 to children.
2017-02-07 01:06:13.823837 | [dumb-init] Received signal 17.
2017-02-07 01:06:13.823860 | [dumb-init] A child with PID 7 was terminated by signal 1.
2017-02-07 01:06:13.823879 | [dumb-init] Forwarded signal 15 to children.
2017-02-07 01:06:13.823900 | [dumb-init] Child exited with status 129. Goodbye.
it received singal 18, then signal 17, and children are killed.
I compared the source code of dumb-init v1.1.3 and v 1.2.0. But can not explain it. Any idea on this?
all these test is based on the same kolla code. the only variable is dumb-init version.
[0] https://github.com/openstack/kolla [1] https://review.openstack.org/#/c/424832/ [2] http://logs.openstack.org/08/429908/3/check/gate-kolla-ansible-dsvm-deploy-ubuntu-source-ubuntu-xenial-nv/260697e/console.html#_2017-02-07_01_09_52_334863 [3] http://logs.openstack.org/81/429681/3/check/gate-kolla-ansible-dsvm-deploy-centos-source-centos-7-nv/d2b0b1b/console.html#_2017-02-07_01_06_13_845614 [4] http://logs.openstack.org/81/429681/3/check/gate-kolla-ansible-dsvm-deploy-centos-source-centos-7-nv/d2b0b1b/console.html#_2017-02-07_01_06_13_814198
Issue Analytics
- State:
- Created 7 years ago
- Comments:19 (15 by maintainers)
iirc we do it this way to avoid a race (we want the child to be foregrounded from the start, not some time after the fork).
If we did the above, I think we’d either have a slight race (child could exec the new process before it gets the controlling TTY), or we’d have to coordinate between the child and parent dumb-init processes to delay the
exec
until the TIOCSCTTY has run in the parent.Here’s the musl implementatio of
setpgrp
, which is more helpful: https://github.com/davidlazar/musl/blob/master/src/unistd/tcsetpgrp.cThe implementation is here: https://github.com/torvalds/linux/blob/894025f24bd028942da3e602b87d9f7223109b14/drivers/tty/tty_jobctrl.c#L459