question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

azure-iot-device v2.1.1 module not connected ('Transport timeout on connection operation')

See original GitHub issue

Hi

I was initially very pleased with the release of v2.1.1 of the library, and it seemed that all the module to hub connectivity issues were a thing of the past. But then, unfortunately, after my pipeline has been running for 2 days I noticed that my one module is no longer receiving messages.

The module logs notes the following:

2020-04-07T00:18:32.812603295Z INFO,No IOT message received.
2020-04-07T00:18:39.005434935Z ERROR,ReauthorizeConnectionOperation: completing with error OperationCancelled('Transport timeout on connection operation',)
2020-04-07T00:18:39.005617945Z ERROR,ConnectionLockStage(ReauthorizeConnectionOperation): op failed.  Unblocking queue with error: OperationCancelled('Transport timeout on connection operation',)
2020-04-07T00:18:39.005778104Z ERROR,UpdateSasTokenOperation: completing with error OperationCancelled('Transport timeout on connection operation',)
2020-04-07T00:18:39.005916890Z ERROR,UseAuthProviderStage(UpdateSasTokenOperation): token update operation failed.  Error=OperationCancelled('Transport timeout on connection operation',)
2020-04-07T00:18:39.006027620Z ERROR,Exception caught in background thread.  Unable to handle.
2020-04-07T00:18:39.006213514Z ERROR,["azure.iot.device.common.pipeline.pipeline_exceptions.OperationCancelled: OperationCancelled('Transport timeout on connection operation',)\n"]
2020-04-07T00:19:02.723699356Z INFO,IOT message received.
2020-04-07T00:22:16.401710573Z INFO,No IOT message received.
2020-04-07T01:16:38.496463054Z ERROR,ReauthorizeConnectionOperation: completing with error OperationCancelled('Transport timeout on connection operation',)
2020-04-07T01:16:38.496880327Z ERROR,ConnectionLockStage(ReauthorizeConnectionOperation): op failed.  Unblocking queue with error: OperationCancelled('Transport timeout on connection operation',)
2020-04-07T01:16:38.497107579Z ERROR,UpdateSasTokenOperation: completing with error OperationCancelled('Transport timeout on connection operation',)
2020-04-07T01:16:38.497272670Z ERROR,UseAuthProviderStage(UpdateSasTokenOperation): token update operation failed.  Error=OperationCancelled('Transport timeout on connection operation',)
2020-04-07T01:16:38.497423074Z ERROR,Exception caught in background thread.  Unable to handle.
2020-04-07T01:16:38.497642965Z ERROR,["azure.iot.device.common.pipeline.pipeline_exceptions.OperationCancelled: OperationCancelled('Transport timeout on connection operation',)\n"]
2020-04-07T02:14:39.106574400Z ERROR,ReauthorizeConnectionOperation: completing with error OperationCancelled('Transport timeout on connection operation',)
2020-04-07T02:14:39.106931896Z ERROR,ConnectionLockStage(ReauthorizeConnectionOperation): op failed.  Unblocking queue with error: OperationCancelled('Transport timeout on connection operation',)
2020-04-07T02:14:39.107147802Z ERROR,UpdateSasTokenOperation: completing with error OperationCancelled('Transport timeout on connection operation',)
2020-04-07T02:14:39.107260415Z ERROR,UseAuthProviderStage(UpdateSasTokenOperation): token update operation failed.  Error=OperationCancelled('Transport timeout on connection operation',)
2020-04-07T02:14:39.107381175Z ERROR,Exception caught in background thread.  Unable to handle.
2020-04-07T02:14:39.107549129Z ERROR,["azure.iot.device.common.pipeline.pipeline_exceptions.OperationCancelled: OperationCancelled('Transport timeout on connection operation',)\n"]
2020-04-07T03:12:38.843798834Z ERROR,ReauthorizeConnectionOperation: completing with error OperationCancelled('Transport timeout on connection operation',)
2020-04-07T03:12:38.844217693Z ERROR,ConnectionLockStage(ReauthorizeConnectionOperation): op failed.  Unblocking queue with error: OperationCancelled('Transport timeout on connection operation',)
2020-04-07T03:12:38.844443777Z ERROR,UpdateSasTokenOperation: completing with error OperationCancelled('Transport timeout on connection operation',)
2020-04-07T03:12:38.844584734Z ERROR,UseAuthProviderStage(UpdateSasTokenOperation): token update operation failed.  Error=OperationCancelled('Transport timeout on connection operation',)
2020-04-07T03:12:38.844707624Z ERROR,Exception caught in background thread.  Unable to handle.
2020-04-07T03:12:38.844885474Z ERROR,["azure.iot.device.common.pipeline.pipeline_exceptions.OperationCancelled: OperationCancelled('Transport timeout on connection operation',)\n"]
2020-04-07T04:10:38.594952545Z ERROR,ReauthorizeConnectionOperation: completing with error OperationCancelled('Transport timeout on connection operation',)
2020-04-07T04:10:38.595298556Z ERROR,ConnectionLockStage(ReauthorizeConnectionOperation): op failed.  Unblocking queue with error: OperationCancelled('Transport timeout on connection operation',)
2020-04-07T04:10:38.595469670Z ERROR,UpdateSasTokenOperation: completing with error OperationCancelled('Transport timeout on connection operation',)
2020-04-07T04:10:38.595631349Z ERROR,UseAuthProviderStage(UpdateSasTokenOperation): token update operation failed.  Error=OperationCancelled('Transport timeout on connection operation',)
2020-04-07T04:10:38.595754787Z ERROR,Exception caught in background thread.  Unable to handle.
2020-04-07T04:10:38.595917935Z ERROR,["azure.iot.device.common.pipeline.pipeline_exceptions.OperationCancelled: OperationCancelled('Transport timeout on connection operation',)\n"]
2020-04-07T05:08:38.699366654Z ERROR,ReauthorizeConnectionOperation: completing with error OperationCancelled('Transport timeout on connection operation',)
2020-04-07T05:08:38.699583323Z ERROR,ConnectionLockStage(ReauthorizeConnectionOperation): op failed.  Unblocking queue with error: OperationCancelled('Transport timeout on connection operation',)
2020-04-07T05:08:38.699708498Z ERROR,UpdateSasTokenOperation: completing with error OperationCancelled('Transport timeout on connection operation',)
2020-04-07T05:08:38.699828326Z ERROR,UseAuthProviderStage(UpdateSasTokenOperation): token update operation failed.  Error=OperationCancelled('Transport timeout on connection operation',)
2020-04-07T05:08:38.699926888Z ERROR,Exception caught in background thread.  Unable to handle.
2020-04-07T05:08:38.700069587Z ERROR,["azure.iot.device.common.pipeline.pipeline_exceptions.OperationCancelled: OperationCancelled('Transport timeout on connection operation',)\n"]
2020-04-07T06:06:38.995493191Z ERROR,ReauthorizeConnectionOperation: completing with error OperationCancelled('Transport timeout on connection operation',)
2020-04-07T06:06:38.995705838Z ERROR,ConnectionLockStage(ReauthorizeConnectionOperation): op failed.  Unblocking queue with error: OperationCancelled('Transport timeout on connection operation',)
2020-04-07T06:06:38.995887761Z ERROR,UpdateSasTokenOperation: completing with error OperationCancelled('Transport timeout on connection operation',)
2020-04-07T06:06:38.996025639Z ERROR,UseAuthProviderStage(UpdateSasTokenOperation): token update operation failed.  Error=OperationCancelled('Transport timeout on connection operation',)
2020-04-07T06:06:38.996153745Z ERROR,Exception caught in background thread.  Unable to handle.
2020-04-07T06:06:38.996327778Z ERROR,["azure.iot.device.common.pipeline.pipeline_exceptions.OperationCancelled: OperationCancelled('Transport timeout on connection operation',)\n"]

The module never recovered from this state, and I had to restart the module. After the restart the module was flooded with messages, thus I can deduce that the edgeHub was still trying to deliver messages to the module while in this state.

In builds prior to v2.1.1 I used to look for the edgeHub debug message “2020-04-07 13:58:37.567 +00:00 [WRN] - Module some-device/some_module is not connected” and if persistent for 15 minutes then I would restart the module from a CRON job on the edge node. This unfortunately can no longer be done, because during this state no messages was output from edgeHub to monitor for this.

I suppose I now will have to come up with a new scheme (for production sites)…

For some more info, I send about 45 messages every second between some of the modules on the edge pipeline. The modules getting the most messages are the most prone to this condition. Modules that only send never has this issue.

@BertKleewein thanks for the watchdog that monitors the disconnection, I can see that it works, but I think there is still another disconnect/issue at play as well.

Thanks for all the efforts.

AB#7366703

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
BertKleeweincommented, May 26, 2020

@dschenzer, @tanieee28 , @LouanDuToitS3 , @dme-development I believe I have a fix for this issue. The attached wheel has a test fix which resolves some ‘transport timeout’ issues. I am moving forward with getting a more robust fix into master. In the mean time, I’d appreciate hearing if the attached wheel resolves your issues.

This fix was built from https://github.com/BertKleewein/azure-iot-sdk-python/tree/bertk-release-icm

azure_iot_device-2.1.2_perf_test-py2.py3-none-any.zip

0reactions
az-iot-builder-01commented, Jun 17, 2020

@LouanDuToitS3, @tanieee28, @BertKleewein, @dme-development, @dschenzer, thank you for your contribution to our open-sourced project! Please help us improve by filling out this 2-minute customer satisfaction survey

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshoot your IoT Edge device - Azure - Microsoft Learn
If you experience issues running Azure IoT Edge in your environment, use this article as a guide for troubleshooting and diagnostics.
Read more >
Troubleshooting Azure IoT Hub error codes | Microsoft Learn
This error occurs when another client creates a new connection to IoT Hub using the same identity, so IoT Hub closes the previous...
Read more >
Get device twin inside a module in an offline scenario
While we encounter our desired behavior of the module client, being able to retrieve the module twin even when there is no internet...
Read more >
Python SDK Unable to Connect to Azure IoT Hub
I created a python module with VSCode extension (Azure IoT Edge: New Iot Edge Solution -> Python module) and tried to deploy onto...
Read more >
Connect downstream devices - Azure IoT Edge - Microsoft Learn
IoT Edge gateways support downstream module connections using symmetric key authentication but not X.509 certificate authentication. To connect ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found