[Xenial] Execution stuck in 'requested' state on a fresh RabbitMQ
See original GitHub issueWhen executing any st2 action for the first time on a fresh & clean RabbitMQ
immediately after st2 startup, - it runs forewer and stuck in requested
state:
root@ubuntu16:~# st2 run core.local echo 123
....................................................................................................................................................
root@ubuntu16:~# st2 execution list
+--------------------------+------------+--------------+------------------------+------------------------+----------------------+
| id | action.ref | context.user | status | start_timestamp | end_timestamp |
+--------------------------+------------+--------------+------------------------+------------------------+----------------------+
| 58c96cadc8980518d7cf8ada | core.local | st2admin | requested | Wed, 15 Mar 2017 | |
| | | | | 16:32:45 UTC | |
+--------------------------+------------+--------------+------------------------+------------------------+----------------------+
root@ubuntu16:~# st2 execution get 58c96cadc8980518d7cf8ada
id: 58c96cadc8980518d7cf8ada
status: requested
parameters:
cmd: echo 123
result: None
I can only guess that at early point st2 is busy with RabbitMQ bootstrapping and for some reason can’t trigger an action (since topic/queue is not yet created/message is lost or something like that ?) when running things for the first time.
Reproduce
Requirements to reproduce:
- OS is Ubuntu Xenial
- RabbitMQ is clean
- st2 was just started
- action runs immediately after st2 start
Script to reproduce
I could reproduce it every time with this script:
#!/bin/bash
# output executed commands
set -o xtrace
sudo st2ctl stop
# emulate fresh & clean RabbitMQ
rabbitmqctl stop_app
sudo rabbitmqctl reset
rabbitmqctl start_app
# isolate & make sure the problem is not with RabbitMQ startup
sleep 30
# but with StackStorm startup itself
sudo st2ctl start
# The command is stuck and runs forever
# See: st2 execution list
st2 run core.local echo 123
This is similar to https://github.com/StackStorm/st2-packages/issues/445#issuecomment-286778253 The problem is more serious than it looks like, being blocker for Automation, when deploying StackStorm in prod. Stuck execution immediately after startup is pretty much a bad thing.
I originally thought this could be solved with packaging, but after repro it sounds like more about StackStorm core. cc @m4dcoder @Kami @lakshmi-kannan.
Issue Analytics
- State:
- Created 7 years ago
- Comments:10 (9 by maintainers)
Top GitHub Comments
Interesting thing about this issue is that I can’t replicate it if I set number of action runner workers to 1.
This makes it look like some kind of weird race inside action runners processes.
I’ve checked again the startup with the previous instructions and it’s indeed fixed.