Hub reporting no available nodes after a few hours
See original GitHub issue🐛 Bug Report
I have a docker grid version 4 deployed on a docker swarm, consisting of one hub and 5 chrome nodes. The grid starts correctly and even runs tests as expected but after a few hours the hub reports that there are no available nodes. When I run a status check on the hub, it reports that it is not ready, however when I run a status check on the nodes (“/readyz” and “/status”), they report to be healthy and ready.
If I restart the hub, when it comes back up, it still cannot recognize the nodes. But, if I scale the nodes to zero and back up to 5, suddenly they are recognized by the hub and all is well again.
By the way, there are no errors on the hub and node logs.
To Reproduce
This is my docker-compose file below and I deploy the grid using “docker stack deploy -c docker-compose.yml myapp”
Expected behavior
The hub should not lose the nodes, unless if the nodes’ status reports that they are not healthy.
Test script reproducing this issue (when applicable)
version: '3.9'
services:
chrome:
image: selenium/node-chrome:4.0.0
volumes:
- /dev/shm:/dev/shm
environment:
- SE_EVENT_BUS_HOST=myapp_hub
- SE_EVENT_BUS_PUBLISH_PORT=4442
- SE_EVENT_BUS_SUBSCRIBE_PORT=4443
- JAVA_OPTS=-Dwebdriver.chrome.whitelistedIps=
- SE_NODE_SESSION_TIMEOUT=60
ports:
- "5555:5555"
networks:
- myapp-net
links:
- hub
deploy:
replicas: 5
entrypoint: bash -c 'SE_OPTS="--host $$HOSTNAME" /opt/bin/entry_point.sh'
hub:
image: selenium/hub:4.0.0
networks:
- myapp-net
environment:
- SE_OPTS="--log-level FINE"
ports:
- "4442:4442"
- "4443:4443"
- "4444:4444"
Environment
OS: Centos 7 Docker-Selenium image version: selenium/hub:4.0.0
–> Docker version: 20.10.5 Docker-Compose version (if applicable): 1.26.2, build eefe0d31 Exact Docker command to start the containers (if using docker-compose, provide the docker-compose file as well): docker stack deploy -c docker-compose.yml myapp
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:34 (8 by maintainers)
Top GitHub Comments
I doubt the logging will give us any more information than we already have. I checked the code and most of the logging is at INFO level.
Anyway, I have done a workaround where I have a script that monitors the status of the hub and if the status is “not ready”, I basically scale the nodes to 0 and then back up to whatever it was. This seems to work.
Here are the scripts: monitor_hub.sh
cd /root/apps/myapp
grep -q "false" <<< "$(curl -fs 'http://127.0.0.1:4444/status' | grep \"ready\" | sed 's/\"//g')" && sh restart_nodes.sh
restart_nodes.sh
docker service scale myapp_chrome=0
sleep 30s
docker service scale myapp_chrome=5
We’ve made some improvements related to this issue during the last couple of weeks, could you please check again using the most recent pre-release? https://github.com/SeleniumHQ/docker-selenium/releases/tag/4.0.0-rc-1-prerelease-20210713