"Stop" button doesn't always stop workers
See original GitHub issueDescribe the bug
In my test environment I am running a distributed loadtest with 3 workers in GUI mode. Intermittently (seemingly generally on tests following the initial test after spin up of the hosts), when pressing the “STOP” button on the master GUI web interface, the feedback will say “Stopping” and looking at the output on the workers console they will also report that they are “Stopping” but the number of users never decreases and the load test continues. We have to “CTRL-C” each worker separately to get the test to stop.
Expected behavior
When the “STOP” button is pressed the load test should stop immediately on all workers (or at least within a few seconds).
Actual behavior
The workers and the GUI report the status as “STOPPING” and the test continues – the “STOP” button is now gone so the only recourse is to log on to each worker and stop the test manually.
Output from one of the workers on the command line is: “[2020-10-06 23:12:49,585] locust02.YYYY.XXXX.local/INFO/locust.runners: Stopping 666 users”
Steps to reproduce
On the 3 workers we run this command: locust -f locustfile.py --worker --master-host=172.20.2.254 On the master host we run this command: locust -f locustfile.py --master
On the GUI we set the number of users to 2000 and the spawn rate of 100. (We have tried 6000 and 300, and 1000 and 100.) The problem is intermittent.
locustfile.py is shown below
Environment
- OS: CentOS Linux release 7.6.1810
- Python version: Python 3.6.8
- Locust version: locust 1.2.2
- Locust command line that you ran: worker: locust -f locustfile.py --worker --master-host=172.20.2.254 master: locust -f locustfile.py --master
- Locust file contents (anonymized if necessary):
# coding=utf-8
import json
import os
import random
import requests
import time
import locust_plugins
from locust import HttpUser, task, between, TaskSet, User
deviceIdStart = 1000000000000000000
deviceIdRange = 10000
client_id = os.getenv("CLIENTID")
client_secret = os.getenv("CLIENTSECRET")
ballots = ["test"]
candidates = {
"test": {
"T1": "test1",
"T2": "test2",
"T3": "test3",
"T4": "test4",
"T5": "test5",
"T6": "test6",
"T7": "test7",
"T8": "test8",
}
}
def candidate():
ballot_id = random.choice(ballots)
candidate_list = list(candidates[ballot_id].keys())
candidate_id = random.choice(candidate_list)
candidate_name = candidates[ballot_id][candidate_id]
return ballot_id, candidate_id, candidate_name
def device_id():
dIdS = int(os.getenv("DEVIDSTART", deviceIdStart))
dIdR = int(os.getenv("DEVIDRANGE", deviceIdRange))
return random.randrange(dIdS, dIdS + dIdR)
class UserBehavior(HttpUser):
min_wait = 2000
max_wait = 9000
host = os.getenv("TARGET_URL")
def __init__(self, parent):
super(UserBehavior, self).__init__(parent)
self.token = ""
self.headers = {}
self.tokenExpires = 0
def on_start(self):
self.token = self.login()
self.headers = {
"Authorization": "%s %s"
% (self.token["token_type"], self.token["access_token"])
}
self.tokenExpires = time.time() + self.token["expires_in"] - 120
def login(self):
"""
Gets the token for the user
:rtype: dict
"""
global client_id
global client_secret
url = os.getenv("AUTH_URL")
print("Get token with %s" % url)
response = requests.post(
url,
headers={
"X-Client-Id": client_id,
"X-Client-Secret": client_secret,
"cache-control": "no-cache",
},
)
try:
content = json.loads(response.content)
print("Access token: %s" % content.get("access_token"))
return content
except:
print("Error in getToken(): %s" % content.get("error_msg"))
return None
@task
def vote(self):
if self.tokenExpires < time.time():
self.token = self.login()
if self.token:
self.tokenExpires = time.time() + self.token["expires_in"] - 120
else:
print("Unable to get SAT Token")
return None
selection = candidate()
message = {
"Id": "TEST-P",
"dId": device_id(),
"bId": selection[0],
"sIds": selection[1],
"sTexts": selection[2],
}
response = self.client.post(
"/api/v1/test?partner=test", message, headers=self.headers
)
# vim: set fileencoding=utf-8 :
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:11 (3 by maintainers)
Top GitHub Comments
I will try it on the latest master – and will let you know if I see any change. Thanks for the link to the other report – I don’t think it’s exactly the same, but the mention of the “gevent.sleep(0)” that they added gives me an idea of something to try that might help with my problem. I can also temp remove the “randrange” call and see if that has any impact. I’ll report back my findings.
It seems that this problem still exists in the current version. At present, this problem often occurs when I use locust. After clicking the STOP button, the state always shows STOPING,The following is my master’s log information