Wrong message on rebalance
See original GitHub issueDescribe the bug
In distributed mode with several workers If a worker is crushed, we got message “Spawning is complete and report waittime is expired, but not all reports received from workers”. It is not clear was the users spawned by remained workers or not.
Expected behavior
Normal message "All users spawned’.
Actual behavior
Steps to reproduce
- Start master runner with 2 expected workers in config.
- Start 2 workers.
- Wait until all user are spawned.
- Kill one worker.
Environment
- OS:
- Python version: 3.8 / 3.9
- Locust version: (please dont file issues for anything but the most recent release or prerelease builds) 2.8.6
- Locust command line that you ran:
locust --config=master.conf
locust --worker
- Locust file contents (anonymized if necessary):
class User1(HttpUser):
wait_time = constant(0.1)
@task
def hello_world(self):
self.client.get("/")
class User2(HttpUser):
wait_time = constant(0.1)
@task
def hello_world(self):
self.client.get("/")
master.conf
master = true
host = http://127.0.0.1:8000/
expect-workers = 2
headless = true
users = 20
spawn-rate = 20
Issue Analytics
- State:
- Created a year ago
- Reactions:2
- Comments:5 (1 by maintainers)
Top Results From Across the Web
Solving My Weird Kafka Rebalancing Problems & Explaining ...
While Kafka is rebalancing, all involved consumers' processing is blocked ... A consumer reads a message from a topic, processes it, ...
Read more >Re-balancing consumer is re-reading, causing old messages ...
I am using kafka to stream messages between two microservices, but having an issue resolving this one "error". I have auto commit enabled ......
Read more >The Unofficial Kafka Rebalance How-To - Tom Lee (dot co)
Keep an eye out for strange off-bye-one errors here: make sure you don't lose or reprocess messages because you committed the wrong offset ......
Read more >Kafka-Streams - Tips on How to Decrease Re-Balancing ...
During rebalance, consumers stop processing messages for some period of time, and, as a result, processing of events from a topic happens ...
Read more >Apache Kafka Rebalance Listener - Learning Journal
In both the cases, rebalance is triggered either because you didn't poll for a while or something else went wrong. Your current partitions...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@bhanuprakash-1 You can experiment with the heartbeat timeout by adding this to your locustfile:
Even with the default settings, you shouldnt have issues unless the io takes more than 60s though, so its a bit weird. Possibly your locustfile is blocking the worker forever, and then it wont matter what the timeout is.
Facing similar kind of issue. Im trying to run load with 3 workers. I have a piece of code which does a File I/O and it is run as soon as we start the test ( on each worker). I guess this is causing a blocking on that worker .
Im getting below logs on master: Is there any way to increase the wait time? What should I be doing right in this case? Load test is running perfectly fine when run on only master (although the File I/O takes some time 10 seconds).
[2022-12-09 19:55:09,841] bhanu-job-loadt/INFO/locust.main: Starting web interface at http://0.0.0.0:8089 (accepting connections from all network interfaces) [2022-12-09 19:55:09,855] bhanu-job-loadt/INFO/locust.main: Starting Locust 2.13.0 [2022-12-09 19:55:26,919] bhanu-job-loadt/INFO/locust.runners: Worker bhanu-job-loadt_7d70f84313fc456f98328d33ab159ce6 (index 0) reported as ready. 1 workers connected. [2022-12-09 19:55:41,296] bhanu-job-loadt/INFO/locust.runners: Worker bhanu-job-loadt_ac6d0fdf7ec84b8983e5c09c8a923827 (index 1) reported as ready. 2 workers connected. [2022-12-09 19:55:51,586] bhanu-job-loadt/INFO/locust.runners: Worker bhanu-job-loadt_4a5196bf34014d21b5cb5f2d7d786e55 (index 2) reported as ready. 3 workers connected. [2022-12-09 19:56:16,910] bhanu-job-loadt/INFO/locust.runners: Sending spawn jobs of 50 users at 10.00 spawn rate to 3 ready workers [2022-12-09 19:56:20,499] bhanu-job-loadt/INFO/locust.runners: Worker bhanu-job-loadt_7d70f84313fc456f98328d33ab159ce6 failed to send heartbeat, setting state to missing. [2022-12-09 19:56:21,513] bhanu-job-loadt/INFO/locust.runners: Worker bhanu-job-loadt_ac6d0fdf7ec84b8983e5c09c8a923827 failed to send heartbeat, setting state to missing. [2022-12-09 19:56:21,513] bhanu-job-loadt/INFO/locust.runners: Worker bhanu-job-loadt_4a5196bf34014d21b5cb5f2d7d786e55 failed to send heartbeat, setting state to missing. [2022-12-09 19:56:21,513] bhanu-job-loadt/INFO/locust.runners: The last worker went missing, stopping test. [2022-12-09 19:56:21,938] bhanu-job-loadt/INFO/locust.runners: Spawning is complete and report waittime is expired, but not all reports received from workers: {} (0 total users) [2022-12-09 19:57:20,935] bhanu-job-loadt/WARNING/locust.runners: You can't start a distributed test before at least one worker processes has connected [2022-12-09 19:57:21,947] bhanu-job-loadt/WARNING/locust.runners: You can't start a distributed test before at least one worker processes has connected [2022-12-09 19:57:22,557] bhanu-job-loadt/INFO/locust.runners: Discarded report from unrecognized worker bhanu-job-loadt_4a5196bf34014d21b5cb5f2d7d786e55