question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Tasking Manager Load Testing

See original GitHub issue

I load tested the Tasking Manager last week with locustio, and the findings have been pretty interesting.

Locust?

Locust works by simulating requests to specific end points, and weighting requests by end points. The end points are defined in a file: locustfile.py, and requests are sent to each of these end points.

Load Test Simulation Settings

  • Number of users: 500
  • Hatch rate (Number of users added/s): 50/s

Load Tests

Load Test 1

Initially, I tested on a single EC2 - a c3.2xlarge, which has 8 CPU cores, and 15GB RAM. Snippets from the results are as follows:

screen shot 2019-02-25 at 5 00 08 pm screen shot 2019-02-25 at 5 00 15 pm screen shot 2019-02-25 at 5 00 20 pm screen shot 2019-02-25 at 5 00 33 pm

Load Test 2

Next, I increased the number of instances to 4, so requests are distributed equally among all instances. I also got rid of the requests to /. Snippets from the results are as follows: screen shot 2019-02-25 at 5 42 37 pm Continued… screen shot 2019-02-25 at 6 02 55 pm

Next actions

  • The CPU utilization across instances is really low - it averages around 2 - 10%, even though I am running gunicorn with (cores * 2)+1 workers. I also ran htop on these instances, and saw that all the CPU cores are not being used. This needs to be investigated - I am not sure if there is an option on gunicorn to spread workers across various processes. Testing async workers and optimising gunicorn may help with some of these issues.
  • When the request count dips, and then increases sharply, it’s accompanied by a massive spike in latency (as you can see in load test 1, where the latency increases to 720s at that yellow peak) - why does this happen? I think fixing 1, can help fix this issue as well.
  • All the failures are a result of gateway timeouts - this is when the requests take longer than the ELB timeout period.
  • Running tests on eu-west-1 vs us-east-1.

Locust

  • I need to figure out getting locust stats across a time range, and having a file with these stats
  • A single locust instance, that is localhost is sufficient to run queries for our scale, but this is also likely influenced by network and bandwidth locally. Going to set this up on an ec2, once I figure out a way to get the output stats.

cc/ @hotosm/tech

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
arunasankcommented, Apr 17, 2019

Overall, I feel really good about our new stack, and I think we are ready to move ahead based on the load tests. The latency looks sharp, and we are able to serve about 20-25 users per EC2 instance we are using. Which should be a good indicator of how to scale up our stacks during peak traffic.

@willemarcel the latency we saw during your load tests was because we were running fewer instances on the CloudFormation stack vs the production stack, which we have fixed now. Thanks for identifying that issue. 🙇‍♀️

0reactions
arunasankcommented, May 13, 2019

No next actions. Closing here!

Read more comments on GitHub >

github_iconTop Results From Across the Web

API Docs - HOT Tasking Manager
Tasking Manager is a the tool for coordination of volunteers and organization of groups to map on OpenStreetMap.
Read more >
HOT Tasking Manager
Tasking Manager is a the tool for coordination of volunteers and organization of groups to map on OpenStreetMap.
Read more >
AzureLoadTest@1 - Azure Load Testing v1 task
Use this task to run an Apache JMeter script by using Azure Load Testing Preview. Azure Load Testing is a fully managed load...
Read more >
Project management in performance and load testing projects
As a performance testing manager, you're going to have to design everything ... Transfer tasks from the plan to Trello/Jira or a similar...
Read more >
Performance Testing vs. Load Testing vs. Stress Testing
A stress test is a type of performance test that checks the upper limits of your system by testing it under extreme loads,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found