Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Wrong message on rebalance

See original GitHub issue

Describe the bug

In distributed mode with several workers If a worker is crushed, we got message “Spawning is complete and report waittime is expired, but not all reports received from workers”. It is not clear was the users spawned by remained workers or not.

Expected behavior

Normal message "All users spawned’.

Actual behavior

Steps to reproduce

Start master runner with 2 expected workers in config.
Start 2 workers.
Wait until all user are spawned.
Kill one worker.

Environment

OS:
Python version: 3.8 / 3.9
Locust version: (please dont file issues for anything but the most recent release or prerelease builds) 2.8.6
Locust command line that you ran:

locust --config=master.conf
locust --worker

Locust file contents (anonymized if necessary):

class User1(HttpUser):
    wait_time = constant(0.1)
    @task
    def hello_world(self):
        self.client.get("/")


class User2(HttpUser):
    wait_time = constant(0.1)
    @task
    def hello_world(self):
        self.client.get("/")

master.conf

master = true
host = http://127.0.0.1:8000/
expect-workers = 2
headless = true
users = 20
spawn-rate = 20

Issue Analytics

State:
Created a year ago
Reactions:2
Comments:5 (1 by maintainers)

Top GitHub Comments

1reaction

cyberwcommented, Dec 10, 2022

@bhanuprakash-1 You can experiment with the heartbeat timeout by adding this to your locustfile:

from locust import runners
runners.HEARTBEAT_DEAD_INTERNAL = -600 # 10 minutes instead of 1 minute, which is the default
runners.HEARTBEAT_LIVENESS = 30 # default is to make three attempts

Even with the default settings, you shouldnt have issues unless the io takes more than 60s though, so its a bit weird. Possibly your locustfile is blocking the worker forever, and then it wont matter what the timeout is.

0reactions

bhanuprakash-1commented, Dec 9, 2022

Facing similar kind of issue. Im trying to run load with 3 workers. I have a piece of code which does a File I/O and it is run as soon as we start the test ( on each worker). I guess this is causing a blocking on that worker .

Im getting below logs on master: Is there any way to increase the wait time? What should I be doing right in this case? Load test is running perfectly fine when run on only master (although the File I/O takes some time 10 seconds).

[2022-12-09 19:55:09,841] bhanu-job-loadt/INFO/locust.main: Starting web interface at http://0.0.0.0:8089 (accepting connections from all network interfaces) [2022-12-09 19:55:09,855] bhanu-job-loadt/INFO/locust.main: Starting Locust 2.13.0 [2022-12-09 19:55:26,919] bhanu-job-loadt/INFO/locust.runners: Worker bhanu-job-loadt_7d70f84313fc456f98328d33ab159ce6 (index 0) reported as ready. 1 workers connected. [2022-12-09 19:55:41,296] bhanu-job-loadt/INFO/locust.runners: Worker bhanu-job-loadt_ac6d0fdf7ec84b8983e5c09c8a923827 (index 1) reported as ready. 2 workers connected. [2022-12-09 19:55:51,586] bhanu-job-loadt/INFO/locust.runners: Worker bhanu-job-loadt_4a5196bf34014d21b5cb5f2d7d786e55 (index 2) reported as ready. 3 workers connected. [2022-12-09 19:56:16,910] bhanu-job-loadt/INFO/locust.runners: Sending spawn jobs of 50 users at 10.00 spawn rate to 3 ready workers [2022-12-09 19:56:20,499] bhanu-job-loadt/INFO/locust.runners: Worker bhanu-job-loadt_7d70f84313fc456f98328d33ab159ce6 failed to send heartbeat, setting state to missing. [2022-12-09 19:56:21,513] bhanu-job-loadt/INFO/locust.runners: Worker bhanu-job-loadt_ac6d0fdf7ec84b8983e5c09c8a923827 failed to send heartbeat, setting state to missing. [2022-12-09 19:56:21,513] bhanu-job-loadt/INFO/locust.runners: Worker bhanu-job-loadt_4a5196bf34014d21b5cb5f2d7d786e55 failed to send heartbeat, setting state to missing. [2022-12-09 19:56:21,513] bhanu-job-loadt/INFO/locust.runners: The last worker went missing, stopping test. [2022-12-09 19:56:21,938] bhanu-job-loadt/INFO/locust.runners: Spawning is complete and report waittime is expired, but not all reports received from workers: {} (0 total users) [2022-12-09 19:57:20,935] bhanu-job-loadt/WARNING/locust.runners: You can't start a distributed test before at least one worker processes has connected [2022-12-09 19:57:21,947] bhanu-job-loadt/WARNING/locust.runners: You can't start a distributed test before at least one worker processes has connected [2022-12-09 19:57:22,557] bhanu-job-loadt/INFO/locust.runners: Discarded report from unrecognized worker bhanu-job-loadt_4a5196bf34014d21b5cb5f2d7d786e55