question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Hub reporting no available nodes after a few hours

See original GitHub issue

🐛 Bug Report

I have a docker grid version 4 deployed on a docker swarm, consisting of one hub and 5 chrome nodes. The grid starts correctly and even runs tests as expected but after a few hours the hub reports that there are no available nodes. When I run a status check on the hub, it reports that it is not ready, however when I run a status check on the nodes (“/readyz” and “/status”), they report to be healthy and ready.

If I restart the hub, when it comes back up, it still cannot recognize the nodes. But, if I scale the nodes to zero and back up to 5, suddenly they are recognized by the hub and all is well again.

By the way, there are no errors on the hub and node logs.

To Reproduce

This is my docker-compose file below and I deploy the grid using “docker stack deploy -c docker-compose.yml myapp”

Expected behavior

The hub should not lose the nodes, unless if the nodes’ status reports that they are not healthy.

Test script reproducing this issue (when applicable)

version: '3.9'

services:
  chrome:
    image: selenium/node-chrome:4.0.0
    volumes:
      - /dev/shm:/dev/shm
    environment:
      - SE_EVENT_BUS_HOST=myapp_hub
      - SE_EVENT_BUS_PUBLISH_PORT=4442
      - SE_EVENT_BUS_SUBSCRIBE_PORT=4443
      - JAVA_OPTS=-Dwebdriver.chrome.whitelistedIps=
      - SE_NODE_SESSION_TIMEOUT=60
    ports:
      - "5555:5555"
    networks:
      - myapp-net
    links:
      - hub
    deploy:
      replicas: 5
    entrypoint: bash -c 'SE_OPTS="--host $$HOSTNAME" /opt/bin/entry_point.sh'

  hub:
    image: selenium/hub:4.0.0
    networks:
      - myapp-net
    environment:
      - SE_OPTS="--log-level FINE"
    ports:
      - "4442:4442"
      - "4443:4443"
      - "4444:4444"

Environment

OS: Centos 7 Docker-Selenium image version: selenium/hub:4.0.0

–> Docker version: 20.10.5 Docker-Compose version (if applicable): 1.26.2, build eefe0d31 Exact Docker command to start the containers (if using docker-compose, provide the docker-compose file as well): docker stack deploy -c docker-compose.yml myapp

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:34 (8 by maintainers)

github_iconTop GitHub Comments

3reactions
sabelosimelanecommented, Apr 9, 2021

I doubt the logging will give us any more information than we already have. I checked the code and most of the logging is at INFO level.

Anyway, I have done a workaround where I have a script that monitors the status of the hub and if the status is “not ready”, I basically scale the nodes to 0 and then back up to whatever it was. This seems to work.

Here are the scripts: monitor_hub.sh cd /root/apps/myapp grep -q "false" <<< "$(curl -fs 'http://127.0.0.1:4444/status' | grep \"ready\" | sed 's/\"//g')" && sh restart_nodes.sh

restart_nodes.sh docker service scale myapp_chrome=0 sleep 30s docker service scale myapp_chrome=5

1reaction
diemolcommented, Jul 13, 2021

We’ve made some improvements related to this issue during the last couple of weeks, could you please check again using the most recent pre-release? https://github.com/SeleniumHQ/docker-selenium/releases/tag/4.0.0-rc-1-prerelease-20210713

Read more comments on GitHub >

github_iconTop Results From Across the Web

Selenium 4: Chrome Node does not register correctly to the hub
My Issue: The hub and the node are starting, but the node just keeps sending the registration event and the hub is logging...
Read more >
[Node] No data visible in the APM - New Relic Explorers Hub
Run your Node.js service long enough until the agent shuts down after failing to connect, this should be less than 5 minutes. Note:...
Read more >
ISE 2.7 Node shows no data available in System Summary
Solved: Hello, I applied patch 2 on my deployment 17 days ago. Best of my recollection the System Summary page showed data for...
Read more >
Working with nodes | Nodes | OpenShift Container Platform 4.10
Node labels are not persisted after a node is deleted even if the node is backed up ... by replication controllers are rescheduled...
Read more >
Selenium Grid Tutorial: Hub & Node (with Example) - Guru99
Following are the main differences between Selenium Grid 1 and 2. ... running the nodes need not be the same platform as that...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found