Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Consumption Plan Scaling Issues

See original GitHub issue

I am attempting to measure some performance metrics of Azure Functions and other serverless platforms, and am having trouble getting a test function to scale in a Consumption plan. I am creating a Node.js function that busy-waits in a loop for 1 second, and then exits. Then, I am triggering that function once per second on a new Consumption plan. I continue this request rate for five minutes. Here is a graph of one of these tests, where the vertical axis is the response time minus the function duration (1 second), and the horizontal axis is the number of seconds into the test:

scaling

The idea is that a few of these requests come in while the first instance is being assigned to the function, and then the instance tries to play catch-up with the requests (to no avail because the instance is receiving 1 second of work per second). It seems like either these unhandled requests and/or CPU load (from the busy-waiting) on the instance should trigger scaling, but I’m not seeing any additional instances added to the function. Is there something I’m doing wrong here?

Repro steps

Create function app inside an App Service with a Consumption plan
Deploy the following function with an HTTP trigger to the function app:

'use strict';

var id;

module.exports.test = function (context, req) {
  var start = Date.now();

  if (typeof id === 'undefined') {
    id = uuid();
  }

  while (Date.now() < start + req.body.duration) {}

  context.done(null, { body: {
    duration: Date.now() - start,
    id: id
  }});
};

function uuid(a) {
  return a?(a^Math.random()*16>>a/4).toString(16):([1e7]+-1e3+-4e3+-8e3+-1e11).replace(/[018]/g,uuid);
}

Trigger this function once per second with the following request body:

{
    "duration": 1000
}

Collect the response times of the function under the 1 request per second load for a few minutes.
Observe that the returned id does not change, and the response times are consistently longer than 1 second.

Expected behavior

The first request begins execution somewhere between 4 and 30 seconds (I’ve been seeing a large range in cold execution response times, is this typical?) after the start of the test, presumably because there were no instances assigned to the function and it takes some time to add the first instance. It seems like a second instance could be assigned to the function in a similar amount of time (at which point the function can complete 2 seconds of work per second), and then the response times will linearly decrease to near 0. Am I wrong about the behavior I suspect, and if so, what should I be seeing scalability-wise?

Actual behavior

Described above. This behavior is contrasted by running the same experiment again, except without any waiting by sending this as the request body:

{
    "duration": 0
}

Which results in a graph with a similar cold start latency, but is easily able to catch up to the requests because they are not causing work:

scaling_better

Known workarounds

Hoping to find one.

Related information

Let me know if you want me to provide any additional information.

Issue Analytics

State:
Created 7 years ago
Reactions:9
Comments:36 (17 by maintainers)

Top GitHub Comments

4reactions

davidebbocommented, Apr 12, 2017

I wonder whether there could be an option to set the maximum number of concurrent function calls per instance?

Yep, we are adding this very option! See https://github.com/Azure/azure-webjobs-sdk-script/wiki/Http-Functions#throttling for details. This should be available by the middle of next week.

3reactions

Gorthogcommented, Oct 30, 2017

I created an Azure Function that busy waits 30 ms. I created a simple load test that creates an http request every 20 ms, for a total of 1000 requests. The results are simply horrible! I see average of almost 30 seconds per request. After limiting the concurrent http requests in host.json to 1 (!) I managed to get to 7-10 seconds per requests. As far as I can tell, scaling does not work at all in my case - I see a single instance ID. When running the same test with 200 ms delay, I get what I expect: an average of ~190 ms per call.

Is there a workaround to make scaling work with http trigger on consumption plans?

Here is my Azure Function:

static ILogger log;
        [FunctionName("DataQuery")]
        public static async Task<HttpResponseMessage> Run(
            [HttpTrigger(AuthorizationLevel.Anonymous, "post", Route = null)]
            DataQueryRequest dataQueryRequest,
            ILogger _log)
        {
            var stopper = new Stopwatch();
            stopper.Start();
            BusyWait(30);

            var resultObj = JObject.FromObject(new
            {
                Duration = stopper.ElapsedMilliseconds,
                InstanceId = Environment.GetEnvironmentVariable("WEBSITE_INSTANCE_ID", EnvironmentVariableTarget.Process),
            });

            return new HttpResponseMessage(HttpStatusCode.OK) { Content = new StringContent(resultObj.ToString(), System.Text.Encoding.UTF8, "application/json") };            
        }

private static void BusyWait(int ms)
        {
            var stopper = new Stopwatch();
            stopper.Start();
            do
            {
                // wait
            }
            while (stopper.ElapsedMilliseconds < ms);
        }