Consumption Plan Scaling Issues
See original GitHub issueI am attempting to measure some performance metrics of Azure Functions and other serverless platforms, and am having trouble getting a test function to scale in a Consumption plan. I am creating a Node.js function that busy-waits in a loop for 1 second, and then exits. Then, I am triggering that function once per second on a new Consumption plan. I continue this request rate for five minutes. Here is a graph of one of these tests, where the vertical axis is the response time minus the function duration (1 second), and the horizontal axis is the number of seconds into the test:
The idea is that a few of these requests come in while the first instance is being assigned to the function, and then the instance tries to play catch-up with the requests (to no avail because the instance is receiving 1 second of work per second). It seems like either these unhandled requests and/or CPU load (from the busy-waiting) on the instance should trigger scaling, but I’m not seeing any additional instances added to the function. Is there something I’m doing wrong here?
Repro steps
-
Create function app inside an App Service with a Consumption plan
-
Deploy the following function with an HTTP trigger to the function app:
'use strict';
var id;
module.exports.test = function (context, req) {
var start = Date.now();
if (typeof id === 'undefined') {
id = uuid();
}
while (Date.now() < start + req.body.duration) {}
context.done(null, { body: {
duration: Date.now() - start,
id: id
}});
};
function uuid(a) {
return a?(a^Math.random()*16>>a/4).toString(16):([1e7]+-1e3+-4e3+-8e3+-1e11).replace(/[018]/g,uuid);
}
- Trigger this function once per second with the following request body:
{
"duration": 1000
}
-
Collect the response times of the function under the 1 request per second load for a few minutes.
-
Observe that the returned id does not change, and the response times are consistently longer than 1 second.
Expected behavior
The first request begins execution somewhere between 4 and 30 seconds (I’ve been seeing a large range in cold execution response times, is this typical?) after the start of the test, presumably because there were no instances assigned to the function and it takes some time to add the first instance. It seems like a second instance could be assigned to the function in a similar amount of time (at which point the function can complete 2 seconds of work per second), and then the response times will linearly decrease to near 0. Am I wrong about the behavior I suspect, and if so, what should I be seeing scalability-wise?
Actual behavior
Described above. This behavior is contrasted by running the same experiment again, except without any waiting by sending this as the request body:
{
"duration": 0
}
Which results in a graph with a similar cold start latency, but is easily able to catch up to the requests because they are not causing work:
Known workarounds
Hoping to find one.
Related information
Let me know if you want me to provide any additional information.
Issue Analytics
- State:
- Created 7 years ago
- Reactions:9
- Comments:36 (17 by maintainers)
Top GitHub Comments
Yep, we are adding this very option! See https://github.com/Azure/azure-webjobs-sdk-script/wiki/Http-Functions#throttling for details. This should be available by the middle of next week.
I created an Azure Function that busy waits 30 ms. I created a simple load test that creates an http request every 20 ms, for a total of 1000 requests. The results are simply horrible! I see average of almost 30 seconds per request. After limiting the concurrent http requests in host.json to 1 (!) I managed to get to 7-10 seconds per requests. As far as I can tell, scaling does not work at all in my case - I see a single instance ID. When running the same test with 200 ms delay, I get what I expect: an average of ~190 ms per call.
Is there a workaround to make scaling work with http trigger on consumption plans?
Here is my Azure Function: