question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Issue/Bug?] Function Timeout

See original GitHub issue

Hello ! We are currently using several AzureFunctions in production and have been facing daily timeouts since a few months.

Is your question related to a specific version? If so, please specify:

Function app details

  • Deployment using AzureCLI.
  • Docker runtime: mcr.microsoft.com/azure-functions/python:2.0.14494-python3.6-appservice
  • (we use this specific version, as suggested by Azure technical support, because we encountered some bugs with newer ones.
  • Operating system: Linux
  • AppService plan: Premium, P2V2: 1

What binding does your question apply to, if any? (e.g. Blob Trigger, Event Hub Binding, etc)

The FunctionApp contains several functions:

  1. Some functions with HTTP Binding which receive data and push them to EventHub
  2. Some crons
  3. Some functions with Event Hub bindings, which receive data (from 1.) and push them to MongoDb/Slack/BlobStorage

This code base is deployed into two Function App, by disabling the appropriate functions, one (let’s call it A) for “1.” and another one (let’s call it B) for “2. and 3.”.

Question

For safety reason, we have a timeout of 2 minutes in host.json. Knowing that every function should run in less than 200ms

Since a few months, we are facing timeouts on a daily basis. Those timeouts mostly impact one of the Functions of FunctionApp “A”, the ones which receives nearly all of the trafic. But we also have a few timeouts on Function B. I did not find any obvious pattern / cpu or memory “excessive usage” …

The weird part is, when a timeout occurs, the logs who should be produced by the code of the function are not generated … which might indicate that our function is not executed for some reason.

Here is a simplified version of our code with extensive logging:

event_producer: EventProducer = EventProducer(...)

async def main(req: func.HttpRequest) -> func.HttpResponse:
    data: dict = build_event_data(req)
    await event_producer.produce(data)
    return func.HttpResponse(status_code=200)

def build_event_data(req) -> dict:
    logging.info("Entering main")
    ...
    return ...

import asyncio

class EventProducer:
    def __init__(self, password: str):
        self._password: str = password
        self.client = EventHubProducerClient.from_connection_string(self._password, logging_enable=True)
        self.lock = asyncio.Lock()

    async def produce(self, data: dict):
        logging.info("Start BSON encode")
        bson_encoded_data = bson.BSON.encode(data)
        logging.info("Done BSON encode")

        logging.info("Start B64 encode")
        b64_encoded_data = base64.b64encode(bson_encoded_data)
        logging.info("Done B64 encode")

        logging.info("Start async with")
        async with self.lock:
            logging.info("In async with")

            logging.info("Start send batch")
            await self.client.send_batch([EventData(b64_encoded_data)])
            logging.info("Done send batch")

        logging.info("Done async with")

We have also been having issues with EventHubProducerClient and I am currently working with @yunhaoling on them here. We use to have a threading approach (see here) but we have move to async/await as advised by @yunhaoling.

Here a screenshot of the logs :

  • Overall view of the executed functions: here
  • All of the logs right before the timeout happens: here
  • Logs of the FIRST execution which times out: here

From my understanding there are only two explanations:

  • Something is broken in our code, it keeps running, and somehow prevent the execution of the function (is that even possible ?)
  • Something is wrong with the host system

I have been investigating this for weeks now, doing dozen of experiments. I believe that we had several issues underneath, those related to EventHubProducerClient which should be fixed (using a lock workaround) by now.

Could you please have a look at it and advise me on how to proceed ? If necessary, I can give you a temporary access to our code base.

Thanks !

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
vrdmrcommented, Mar 22, 2022

Thanks a lot @stael for the detailed issue and bug report. I’ll work with the team to go through this issue and make our logging/documentation better in the short term to get the knowledge out. And work with the EH extension team to see if this can be fixed in future.

Thanks a lot again.

0reactions
Staelcommented, Mar 23, 2022

Hello @v-bbalaiagar,

here are the conclusion of the investigation & tests we did with @yunhaoling :

  • Sending data to EventHub, using EventHubProducer sometimes hangs indefinitely and triggers a function timeout (see logs)
  • After much investigations, @yunhaoling wasn’t able to reproduce
  • I ended up using the multi-output approach proposed by @yunhaoling (see implementation)
  • This solution seems to work.
  • However, I cannot do the same to send data to BlobStorage as it isn’t possible to control the “path” dynamically in python.

I believe that there should be a strong warning somewhere about using EventHubProducer in an AzureFunction.


Finally, I also found a bug: When you define an output, you must specify a eventHubName, which will be ignored in favor of the one provided in the connection. Example:

{
      "type": "eventHub",
      "name": "queue",
      "eventHubName": "__MUST_EXISTS_BUT_WILL_BE_OVERWRITTEN_BY_CONNECTION__",
      "connection": "<connection>",
      "direction": "out"
}

However, it seems that the eventHubName must not be shared between functions using a distinct connection, otherwise, the data will be send to the wrong EventHub. Indeed, I have several functions sending data to EventHub using the configuration defined above. For each function, I set the eventHubName to __MUST_EXISTS_BUT_WILL_BE_OVERWRITTEN_BY_CONNECTION__ … for code’s clarity. And I ended up with data going to the wrong EventHub.

However, I also have several functions receiving data from EventHub. And I also used __MUST_EXISTS_BUT_WILL_BE_OVERWRITTEN_BY_CONNECTION__ as eventHubName but on that case I did not encounter any issue.

Could you please have a look at that / dispatch it to the proper team ? I believe that at least a clear warning should be added in the documentation

Read more comments on GitHub >

github_iconTop Results From Across the Web

Geolocation timeout issue - Bug Reports - Fuse Community
Hi guys i'm facing an little issue with geolocation module. timeout for geolocation doesn't work as ... function getLoc(){ var timeout = 5000;...
Read more >
C#/Tasks Error being logged twice with Task.Wait after wait ...
If the exception occurs BEFORE the timeout for the Wait call, then the exception gets marshalled back to the calling thread.
Read more >
MySQL bugs fixed by Aurora MySQL database engine updates
A negative timeout value is rejected with an error if the server is on a strict SQL mode ; if the server is...
Read more >
QFE2 : SerialPort timeout behaviour /should be a bug (.NET MF 4.2 ...
QFE1: Read() function timeout behaviour: works semi-correctly ... behaviour occurs in a simple read loop, that will point back to a behavioural issue/bug....
Read more >
mesh serial problem , robust issue ,bug? - Nordic Q&A - Nordic ...
hi, nordic engineers Now I am using the meshSDKv3.1.0 ,nrf52832 . When i test the mesh serial module , it seems not so...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found