question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Wrong message on rebalance

See original GitHub issue

Describe the bug

In distributed mode with several workers If a worker is crushed, we got message “Spawning is complete and report waittime is expired, but not all reports received from workers”. It is not clear was the users spawned by remained workers or not.

Expected behavior

Normal message "All users spawned’.

Actual behavior

Steps to reproduce

  1. Start master runner with 2 expected workers in config.
  2. Start 2 workers.
  3. Wait until all user are spawned.
  4. Kill one worker.

Environment

  • OS:
  • Python version: 3.8 / 3.9
  • Locust version: (please dont file issues for anything but the most recent release or prerelease builds) 2.8.6
  • Locust command line that you ran:
locust --config=master.conf
locust --worker
  • Locust file contents (anonymized if necessary):
class User1(HttpUser):
    wait_time = constant(0.1)
    @task
    def hello_world(self):
        self.client.get("/")


class User2(HttpUser):
    wait_time = constant(0.1)
    @task
    def hello_world(self):
        self.client.get("/")

master.conf

master = true
host = http://127.0.0.1:8000/
expect-workers = 2
headless = true
users = 20
spawn-rate = 20

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:2
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
cyberwcommented, Dec 10, 2022

@bhanuprakash-1 You can experiment with the heartbeat timeout by adding this to your locustfile:

from locust import runners
runners.HEARTBEAT_DEAD_INTERNAL = -600 # 10 minutes instead of 1 minute, which is the default
runners.HEARTBEAT_LIVENESS = 30 # default is to make three attempts

Even with the default settings, you shouldnt have issues unless the io takes more than 60s though, so its a bit weird. Possibly your locustfile is blocking the worker forever, and then it wont matter what the timeout is.

0reactions
bhanuprakash-1commented, Dec 9, 2022

Facing similar kind of issue. Im trying to run load with 3 workers. I have a piece of code which does a File I/O and it is run as soon as we start the test ( on each worker). I guess this is causing a blocking on that worker .

Im getting below logs on master: Is there any way to increase the wait time? What should I be doing right in this case? Load test is running perfectly fine when run on only master (although the File I/O takes some time 10 seconds).

[2022-12-09 19:55:09,841] bhanu-job-loadt/INFO/locust.main: Starting web interface at http://0.0.0.0:8089 (accepting connections from all network interfaces) [2022-12-09 19:55:09,855] bhanu-job-loadt/INFO/locust.main: Starting Locust 2.13.0 [2022-12-09 19:55:26,919] bhanu-job-loadt/INFO/locust.runners: Worker bhanu-job-loadt_7d70f84313fc456f98328d33ab159ce6 (index 0) reported as ready. 1 workers connected. [2022-12-09 19:55:41,296] bhanu-job-loadt/INFO/locust.runners: Worker bhanu-job-loadt_ac6d0fdf7ec84b8983e5c09c8a923827 (index 1) reported as ready. 2 workers connected. [2022-12-09 19:55:51,586] bhanu-job-loadt/INFO/locust.runners: Worker bhanu-job-loadt_4a5196bf34014d21b5cb5f2d7d786e55 (index 2) reported as ready. 3 workers connected. [2022-12-09 19:56:16,910] bhanu-job-loadt/INFO/locust.runners: Sending spawn jobs of 50 users at 10.00 spawn rate to 3 ready workers [2022-12-09 19:56:20,499] bhanu-job-loadt/INFO/locust.runners: Worker bhanu-job-loadt_7d70f84313fc456f98328d33ab159ce6 failed to send heartbeat, setting state to missing. [2022-12-09 19:56:21,513] bhanu-job-loadt/INFO/locust.runners: Worker bhanu-job-loadt_ac6d0fdf7ec84b8983e5c09c8a923827 failed to send heartbeat, setting state to missing. [2022-12-09 19:56:21,513] bhanu-job-loadt/INFO/locust.runners: Worker bhanu-job-loadt_4a5196bf34014d21b5cb5f2d7d786e55 failed to send heartbeat, setting state to missing. [2022-12-09 19:56:21,513] bhanu-job-loadt/INFO/locust.runners: The last worker went missing, stopping test. [2022-12-09 19:56:21,938] bhanu-job-loadt/INFO/locust.runners: Spawning is complete and report waittime is expired, but not all reports received from workers: {} (0 total users) [2022-12-09 19:57:20,935] bhanu-job-loadt/WARNING/locust.runners: You can't start a distributed test before at least one worker processes has connected [2022-12-09 19:57:21,947] bhanu-job-loadt/WARNING/locust.runners: You can't start a distributed test before at least one worker processes has connected [2022-12-09 19:57:22,557] bhanu-job-loadt/INFO/locust.runners: Discarded report from unrecognized worker bhanu-job-loadt_4a5196bf34014d21b5cb5f2d7d786e55

Read more comments on GitHub >

github_iconTop Results From Across the Web

Solving My Weird Kafka Rebalancing Problems & Explaining ...
While Kafka is rebalancing, all involved consumers' processing is blocked ... A consumer reads a message from a topic, processes it, ...
Read more >
Re-balancing consumer is re-reading, causing old messages ...
I am using kafka to stream messages between two microservices, but having an issue resolving this one "error". I have auto commit enabled ......
Read more >
The Unofficial Kafka Rebalance How-To - Tom Lee (dot co)
Keep an eye out for strange off-bye-one errors here: make sure you don't lose or reprocess messages because you committed the wrong offset ......
Read more >
Kafka-Streams - Tips on How to Decrease Re-Balancing ...
During rebalance, consumers stop processing messages for some period of time, and, as a result, processing of events from a topic happens ...
Read more >
Apache Kafka Rebalance Listener - Learning Journal
In both the cases, rebalance is triggered either because you didn't poll for a while or something else went wrong. Your current partitions...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found