Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Hub reporting no available nodes after a few hours

See original GitHub issue

🐛 Bug Report

I have a docker grid version 4 deployed on a docker swarm, consisting of one hub and 5 chrome nodes. The grid starts correctly and even runs tests as expected but after a few hours the hub reports that there are no available nodes. When I run a status check on the hub, it reports that it is not ready, however when I run a status check on the nodes (“/readyz” and “/status”), they report to be healthy and ready.

If I restart the hub, when it comes back up, it still cannot recognize the nodes. But, if I scale the nodes to zero and back up to 5, suddenly they are recognized by the hub and all is well again.

By the way, there are no errors on the hub and node logs.

To Reproduce

This is my docker-compose file below and I deploy the grid using “docker stack deploy -c docker-compose.yml myapp”

Expected behavior

The hub should not lose the nodes, unless if the nodes’ status reports that they are not healthy.

Test script reproducing this issue (when applicable)

version: '3.9'

services:
  chrome:
    image: selenium/node-chrome:4.0.0
    volumes:
      - /dev/shm:/dev/shm
    environment:
      - SE_EVENT_BUS_HOST=myapp_hub
      - SE_EVENT_BUS_PUBLISH_PORT=4442
      - SE_EVENT_BUS_SUBSCRIBE_PORT=4443
      - JAVA_OPTS=-Dwebdriver.chrome.whitelistedIps=
      - SE_NODE_SESSION_TIMEOUT=60
    ports:
      - "5555:5555"
    networks:
      - myapp-net
    links:
      - hub
    deploy:
      replicas: 5
    entrypoint: bash -c 'SE_OPTS="--host $$HOSTNAME" /opt/bin/entry_point.sh'

  hub:
    image: selenium/hub:4.0.0
    networks:
      - myapp-net
    environment:
      - SE_OPTS="--log-level FINE"
    ports:
      - "4442:4442"
      - "4443:4443"
      - "4444:4444"

Environment

OS: Centos 7 Docker-Selenium image version: selenium/hub:4.0.0

–> Docker version: 20.10.5 Docker-Compose version (if applicable): 1.26.2, build eefe0d31 Exact Docker command to start the containers (if using docker-compose, provide the docker-compose file as well): docker stack deploy -c docker-compose.yml myapp

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:34 (8 by maintainers)

Top GitHub Comments

3reactions

sabelosimelanecommented, Apr 9, 2021

I doubt the logging will give us any more information than we already have. I checked the code and most of the logging is at INFO level.

Anyway, I have done a workaround where I have a script that monitors the status of the hub and if the status is “not ready”, I basically scale the nodes to 0 and then back up to whatever it was. This seems to work.

Here are the scripts: monitor_hub.sh cd /root/apps/myapp grep -q "false" <<< "$(curl -fs 'http://127.0.0.1:4444/status' | grep \"ready\" | sed 's/\"//g')" && sh restart_nodes.sh

restart_nodes.sh docker service scale myapp_chrome=0 sleep 30s docker service scale myapp_chrome=5

1reaction

diemolcommented, Jul 13, 2021

We’ve made some improvements related to this issue during the last couple of weeks, could you please check again using the most recent pre-release? https://github.com/SeleniumHQ/docker-selenium/releases/tag/4.0.0-rc-1-prerelease-20210713