Tasking Manager Load Testing
See original GitHub issueI load tested the Tasking Manager last week with locustio, and the findings have been pretty interesting.
Locust?
Locust works by simulating requests to specific end points, and weighting requests by end points. The end points are defined in a file: locustfile.py
, and requests are sent to each of these end points.
Load Test Simulation Settings
- Number of users: 500
- Hatch rate (Number of users added/s): 50/s
Load Tests
Load Test 1
Initially, I tested on a single EC2 - a c3.2xlarge
, which has 8 CPU cores, and 15GB RAM. Snippets from the results are as follows:
Load Test 2
Next, I increased the number of instances to 4, so requests are distributed equally among all instances. I also got rid of the requests to /
. Snippets from the results are as follows:
Continued…
Next actions
- The CPU utilization across instances is really low - it averages around 2 - 10%, even though I am running gunicorn with
(cores * 2)+1
workers. I also ranhtop
on these instances, and saw that all the CPU cores are not being used. This needs to be investigated - I am not sure if there is an option on gunicorn to spread workers across various processes. Testing async workers and optimising gunicorn may help with some of these issues. - When the request count dips, and then increases sharply, it’s accompanied by a massive spike in latency (as you can see in load test 1, where the latency increases to 720s at that yellow peak) - why does this happen? I think fixing 1, can help fix this issue as well.
- All the failures are a result of gateway timeouts - this is when the requests take longer than the ELB timeout period.
- Running tests on eu-west-1 vs us-east-1.
Locust
- I need to figure out getting locust stats across a time range, and having a file with these stats
- A single locust instance, that is localhost is sufficient to run queries for our scale, but this is also likely influenced by network and bandwidth locally. Going to set this up on an ec2, once I figure out a way to get the output stats.
cc/ @hotosm/tech
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
API Docs - HOT Tasking Manager
Tasking Manager is a the tool for coordination of volunteers and organization of groups to map on OpenStreetMap.
Read more >HOT Tasking Manager
Tasking Manager is a the tool for coordination of volunteers and organization of groups to map on OpenStreetMap.
Read more >AzureLoadTest@1 - Azure Load Testing v1 task
Use this task to run an Apache JMeter script by using Azure Load Testing Preview. Azure Load Testing is a fully managed load...
Read more >Project management in performance and load testing projects
As a performance testing manager, you're going to have to design everything ... Transfer tasks from the plan to Trello/Jira or a similar...
Read more >Performance Testing vs. Load Testing vs. Stress Testing
A stress test is a type of performance test that checks the upper limits of your system by testing it under extreme loads,...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Overall, I feel really good about our new stack, and I think we are ready to move ahead based on the load tests. The latency looks sharp, and we are able to serve about 20-25 users per EC2 instance we are using. Which should be a good indicator of how to scale up our stacks during peak traffic.
@willemarcel the latency we saw during your load tests was because we were running fewer instances on the CloudFormation stack vs the production stack, which we have fixed now. Thanks for identifying that issue. 🙇♀️
No next actions. Closing here!