question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[PREMIUM] Functions Host causes Container Exit/Restart

See original GitHub issue

We have a Premium Python Durable Functions app that occasionally “hangs” for a very long time in between scheduled tasks.

We’ve investigated the issue quite a bit and discovered that the hanging is due to our container exiting.

We regularly test the resource usage of the Function App in KUDU by using the top command. While in KUDU we notice this message in the bottom tray when we experience the long hang:

hang-message

So the container has evidently exited or restarted. When we run the “Container Crash” report in the “Diagnose and Solve problems” tab of our function app, we can see the following error message about the container exit:

Container exited unexpectedly: last 10 seconds logs [2020-12-15T17:29:01.710719291Z /azure-functions-host/start.sh: line 28: 18 Killed 

The most notable portion is “18 Killed”

azure-functions-host/start.sh: line 28: 18 Killed 

After more investigation it seems like “18” is referring to Process ID 18 which is the Microsoft+ process on our container, which I’m guessing is the functions host:

KUDU Process 18

For some reason the functions host is killing this process, and then the container restarts immediately after - so we believe these are related.

Why does the functions host kill this process, and is this the cause for the container restarting? Is there anything we can do to fix this issue?

Investigative information

Please provide the following:

  • Timestamp: 2020-12-15T17:29:01.710719291Z
  • Function App version: ~3
  • Functions Host Version: 3.0.15149.0
  • Function App name: labelright-test-v2
  • Function name(s): can happen during any function invocation
  • Invocation ID: not related to any particular function invocation
  • Region: East US

Repro steps

It seems to occur randomly, either under heavy or light load it can happen.

The only way to know its happening is to be watching KUDU during regular load, or run the “Container Crash” report in the function app panel to see when it last occurred.

Expected behavior

The functions host does not kill the Microsoft+ process and does not cause a container restart.

Actual behavior

The functions host kills the Microsoft+ process and causes a container restart.

Related information

  • Premium Function App Plan EP2 (2 vCPU, 7GB RAM)
  • Only scaling out to one instance on this plan
  • Using Python 3.6
  • Using a custom docker image, inherits from the Azure python-3.6-appservice image
  • Only additions to our container are installing Java and some libraries for working with PDFs
  • This is a Durable Functions app that has a mix of I/O and CPU bound tasks
  • The application converts PDFs to high quality images, uses tensorflow models for object detection, and uses OpenCV for computer vision and image analysis tasks
  • FUNCTIONS_WORKER_PROCESS_COUNT = 8 (issue occurs with this setting at 2 as well)

Tagging @davidmrdavid from a previous conversation we had on the Durable Functions repo about the same topic:

https://github.com/Azure/azure-functions-durable-extension/issues/1573#issuecomment-730568055

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
marcd123commented, Dec 23, 2020

Hey @davidmrdavid

Thanks for the follow up.

This past week I did a lot of experimenting and found the best settings for our app. I think we could close the issue, though I would like to write up the experience here so others with similar issues can search it up.

There were two specific bugs I was trying to fix:

  1. Language Worker Process is killed via SIGKILL (Exit code 137) due to out-of-memory
  2. Container is restarted causing a long pause in execution

The first issue I resolved in two steps. First, I found which parts of our app consumed the most memory and tried to reduce memory consumption as best as possible. Then, I was still experiencing language workers being killed at 8 workers so I started reducing the count. At 2 language workers, we were unable to produce the first issue above. I found it really helpful to force our function app scale to a single VM, and use KUDU to SSH in and the “top” command to view the language workers and how much memory they were using. Being on a single VM also helped ensure all traffic was going to the same pool of resources, and simulated the “worse case scenario” of heavy work going to one VM, even if you can scale out to many.

In regard to the container restarting, I think out-of-memory could be part of the issue, but this seemed to happen even under very light load. I was able to reproduce the container restart issue using a low-memory workflow with 8, 6, and even 4 language workers. The issue wasn’t totally resolved until we dropped language workers to 2. This is the same number of vCPU on our EP2 service plan. So while memory could be part of the issue, I think it’s possible that the system experiences CPU process locks if you use more language workers than you have vCPUs on your service plan. If you’re using Node or Python Azure Functions, I wouldn’t recommend using more language workers than you have vCPUs in your service plan (if you’re on premium) - even if your workload is low on resource usage.

I’m glad to hear about the PYTHON_THREADPOOL_THREAD_COUNT, and can’t wait to see the docs on this. Does this setting help a single language worker execute multiple function invocations in parallel, or does it just give a single function invocation access to more threads to work on?

PS: Even though we’ve had to work out a lot of issues to get things running just right, the application is very performant now and we’re loving Azure Functions. When we move to production, we’re thinking of running our function app in Kubernetes to expand beyond the VM sizes offered in the Elastic Premium plan and scale to more VMs.

Great work, great product!

0reactions
marcd123commented, Mar 11, 2021

Thanks David!

Marc DeMory Emerging Technology Consultant Accenture Liquid Studio - Chicagohttps://in.accenture.com/liquidstudios/chicagoliquidstudios/ m: 630-244-9625 @.***


From: David Justo @.> Sent: Thursday, March 11, 2021 1:02 PM To: Azure/azure-functions-host @.> Cc: DeMory, Marc @.>; Mention @.> Subject: [External] Re: [Azure/azure-functions-host] [PREMIUM] Functions Host causes Container Exit/Restart (#6985)

This message is from an EXTERNAL SENDER - be CAUTIOUS, particularly with links and attachments.


Closing this issue as it was resolved!

@marcd123https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_marcd123&d=DwMFaQ&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=BbThQ-iApsLvCBycuzvhkO8FJNgUkO6tTOl2tJGWpHk&m=MqDA26oNjO8DUewvFmarqGTHJFy6DKFnZE15mMN7p2A&s=j047zT98LlRh4oUXyjCe-K6lMWso7kGnR33iF82nk-M&e=, the new performance docs are available here: https://docs.microsoft.com/en-us/azure/azure-functions/python-scale-performance-referencehttps://urldefense.proofpoint.com/v2/url?u=https-3A__docs.microsoft.com_en-2Dus_azure_azure-2Dfunctions_python-2Dscale-2Dperformance-2Dreference&d=DwMFaQ&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=BbThQ-iApsLvCBycuzvhkO8FJNgUkO6tTOl2tJGWpHk&m=MqDA26oNjO8DUewvFmarqGTHJFy6DKFnZE15mMN7p2A&s=kBI4C4db8rN4vOZ-80oRVX23BBe3LI48Za7LejF7xPw&e=

Reach out again if you need anything! ⚡ ⚡

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Azure_azure-2Dfunctions-2Dhost_issues_6985-23issuecomment-2D796969693&d=DwMFaQ&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=BbThQ-iApsLvCBycuzvhkO8FJNgUkO6tTOl2tJGWpHk&m=MqDA26oNjO8DUewvFmarqGTHJFy6DKFnZE15mMN7p2A&s=PvPo9TTaO-uiKI3UVDkh-xqJsJ9Yyhu2njCYYR8O-sw&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ACIWIOHXHV2FIMIEBGOBSOTTDEATDANCNFSM4U43JUWA&d=DwMFaQ&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=BbThQ-iApsLvCBycuzvhkO8FJNgUkO6tTOl2tJGWpHk&m=MqDA26oNjO8DUewvFmarqGTHJFy6DKFnZE15mMN7p2A&s=84oRTDxhyKAJwm8L65T6aBVJanzKrW_28NUA5Cncz0w&e=.


This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. Your privacy is important to us. Accenture uses your personal data only in compliance with data protection laws. For further information on how Accenture processes your personal data, please see our privacy statement at https://www.accenture.com/us-en/privacy-policy.


www.accenture.com

Read more comments on GitHub >

github_iconTop Results From Across the Web

Azure Functions scale and hosting
There are three basic Azure Functions hosting plans provided by Azure Functions: Consumption plan, Premium plan, and Dedicated (App Service) ...
Read more >
Azure Functions Premium plan
The Azure Functions Elastic Premium plan is a dynamic scale hosting option for function apps. For other hosting plan options, see the hosting...
Read more >
Dockerfile for building an HTTP triggered Azure Function ...
When trying to start it locally or on Azure, I get this error message: docker: Error response from daemon: OCI runtime create failed: ......
Read more >
Bye bye Azure Functions, Hello Azure Container Apps
This is because a Premium Functions plan can host multiple functions, whereas with ACA the cost is per Container App.
Read more >
Azure functions - Functions host is not running
In Azure Functions, 503 service unavailable causes for the reasons like: Function host is down/restarting. Platform issue due to the backend ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found