azure-iot-device v2.1.1 module not connected ('Transport timeout on connection operation')
See original GitHub issueHi
I was initially very pleased with the release of v2.1.1 of the library, and it seemed that all the module to hub connectivity issues were a thing of the past. But then, unfortunately, after my pipeline has been running for 2 days I noticed that my one module is no longer receiving messages.
The module logs notes the following:
2020-04-07T00:18:32.812603295Z INFO,No IOT message received.
2020-04-07T00:18:39.005434935Z ERROR,ReauthorizeConnectionOperation: completing with error OperationCancelled('Transport timeout on connection operation',)
2020-04-07T00:18:39.005617945Z ERROR,ConnectionLockStage(ReauthorizeConnectionOperation): op failed. Unblocking queue with error: OperationCancelled('Transport timeout on connection operation',)
2020-04-07T00:18:39.005778104Z ERROR,UpdateSasTokenOperation: completing with error OperationCancelled('Transport timeout on connection operation',)
2020-04-07T00:18:39.005916890Z ERROR,UseAuthProviderStage(UpdateSasTokenOperation): token update operation failed. Error=OperationCancelled('Transport timeout on connection operation',)
2020-04-07T00:18:39.006027620Z ERROR,Exception caught in background thread. Unable to handle.
2020-04-07T00:18:39.006213514Z ERROR,["azure.iot.device.common.pipeline.pipeline_exceptions.OperationCancelled: OperationCancelled('Transport timeout on connection operation',)\n"]
2020-04-07T00:19:02.723699356Z INFO,IOT message received.
2020-04-07T00:22:16.401710573Z INFO,No IOT message received.
2020-04-07T01:16:38.496463054Z ERROR,ReauthorizeConnectionOperation: completing with error OperationCancelled('Transport timeout on connection operation',)
2020-04-07T01:16:38.496880327Z ERROR,ConnectionLockStage(ReauthorizeConnectionOperation): op failed. Unblocking queue with error: OperationCancelled('Transport timeout on connection operation',)
2020-04-07T01:16:38.497107579Z ERROR,UpdateSasTokenOperation: completing with error OperationCancelled('Transport timeout on connection operation',)
2020-04-07T01:16:38.497272670Z ERROR,UseAuthProviderStage(UpdateSasTokenOperation): token update operation failed. Error=OperationCancelled('Transport timeout on connection operation',)
2020-04-07T01:16:38.497423074Z ERROR,Exception caught in background thread. Unable to handle.
2020-04-07T01:16:38.497642965Z ERROR,["azure.iot.device.common.pipeline.pipeline_exceptions.OperationCancelled: OperationCancelled('Transport timeout on connection operation',)\n"]
2020-04-07T02:14:39.106574400Z ERROR,ReauthorizeConnectionOperation: completing with error OperationCancelled('Transport timeout on connection operation',)
2020-04-07T02:14:39.106931896Z ERROR,ConnectionLockStage(ReauthorizeConnectionOperation): op failed. Unblocking queue with error: OperationCancelled('Transport timeout on connection operation',)
2020-04-07T02:14:39.107147802Z ERROR,UpdateSasTokenOperation: completing with error OperationCancelled('Transport timeout on connection operation',)
2020-04-07T02:14:39.107260415Z ERROR,UseAuthProviderStage(UpdateSasTokenOperation): token update operation failed. Error=OperationCancelled('Transport timeout on connection operation',)
2020-04-07T02:14:39.107381175Z ERROR,Exception caught in background thread. Unable to handle.
2020-04-07T02:14:39.107549129Z ERROR,["azure.iot.device.common.pipeline.pipeline_exceptions.OperationCancelled: OperationCancelled('Transport timeout on connection operation',)\n"]
2020-04-07T03:12:38.843798834Z ERROR,ReauthorizeConnectionOperation: completing with error OperationCancelled('Transport timeout on connection operation',)
2020-04-07T03:12:38.844217693Z ERROR,ConnectionLockStage(ReauthorizeConnectionOperation): op failed. Unblocking queue with error: OperationCancelled('Transport timeout on connection operation',)
2020-04-07T03:12:38.844443777Z ERROR,UpdateSasTokenOperation: completing with error OperationCancelled('Transport timeout on connection operation',)
2020-04-07T03:12:38.844584734Z ERROR,UseAuthProviderStage(UpdateSasTokenOperation): token update operation failed. Error=OperationCancelled('Transport timeout on connection operation',)
2020-04-07T03:12:38.844707624Z ERROR,Exception caught in background thread. Unable to handle.
2020-04-07T03:12:38.844885474Z ERROR,["azure.iot.device.common.pipeline.pipeline_exceptions.OperationCancelled: OperationCancelled('Transport timeout on connection operation',)\n"]
2020-04-07T04:10:38.594952545Z ERROR,ReauthorizeConnectionOperation: completing with error OperationCancelled('Transport timeout on connection operation',)
2020-04-07T04:10:38.595298556Z ERROR,ConnectionLockStage(ReauthorizeConnectionOperation): op failed. Unblocking queue with error: OperationCancelled('Transport timeout on connection operation',)
2020-04-07T04:10:38.595469670Z ERROR,UpdateSasTokenOperation: completing with error OperationCancelled('Transport timeout on connection operation',)
2020-04-07T04:10:38.595631349Z ERROR,UseAuthProviderStage(UpdateSasTokenOperation): token update operation failed. Error=OperationCancelled('Transport timeout on connection operation',)
2020-04-07T04:10:38.595754787Z ERROR,Exception caught in background thread. Unable to handle.
2020-04-07T04:10:38.595917935Z ERROR,["azure.iot.device.common.pipeline.pipeline_exceptions.OperationCancelled: OperationCancelled('Transport timeout on connection operation',)\n"]
2020-04-07T05:08:38.699366654Z ERROR,ReauthorizeConnectionOperation: completing with error OperationCancelled('Transport timeout on connection operation',)
2020-04-07T05:08:38.699583323Z ERROR,ConnectionLockStage(ReauthorizeConnectionOperation): op failed. Unblocking queue with error: OperationCancelled('Transport timeout on connection operation',)
2020-04-07T05:08:38.699708498Z ERROR,UpdateSasTokenOperation: completing with error OperationCancelled('Transport timeout on connection operation',)
2020-04-07T05:08:38.699828326Z ERROR,UseAuthProviderStage(UpdateSasTokenOperation): token update operation failed. Error=OperationCancelled('Transport timeout on connection operation',)
2020-04-07T05:08:38.699926888Z ERROR,Exception caught in background thread. Unable to handle.
2020-04-07T05:08:38.700069587Z ERROR,["azure.iot.device.common.pipeline.pipeline_exceptions.OperationCancelled: OperationCancelled('Transport timeout on connection operation',)\n"]
2020-04-07T06:06:38.995493191Z ERROR,ReauthorizeConnectionOperation: completing with error OperationCancelled('Transport timeout on connection operation',)
2020-04-07T06:06:38.995705838Z ERROR,ConnectionLockStage(ReauthorizeConnectionOperation): op failed. Unblocking queue with error: OperationCancelled('Transport timeout on connection operation',)
2020-04-07T06:06:38.995887761Z ERROR,UpdateSasTokenOperation: completing with error OperationCancelled('Transport timeout on connection operation',)
2020-04-07T06:06:38.996025639Z ERROR,UseAuthProviderStage(UpdateSasTokenOperation): token update operation failed. Error=OperationCancelled('Transport timeout on connection operation',)
2020-04-07T06:06:38.996153745Z ERROR,Exception caught in background thread. Unable to handle.
2020-04-07T06:06:38.996327778Z ERROR,["azure.iot.device.common.pipeline.pipeline_exceptions.OperationCancelled: OperationCancelled('Transport timeout on connection operation',)\n"]
The module never recovered from this state, and I had to restart the module. After the restart the module was flooded with messages, thus I can deduce that the edgeHub was still trying to deliver messages to the module while in this state.
In builds prior to v2.1.1 I used to look for the edgeHub debug message “2020-04-07 13:58:37.567 +00:00 [WRN] - Module some-device/some_module is not connected” and if persistent for 15 minutes then I would restart the module from a CRON job on the edge node. This unfortunately can no longer be done, because during this state no messages was output from edgeHub to monitor for this.
I suppose I now will have to come up with a new scheme (for production sites)…
For some more info, I send about 45 messages every second between some of the modules on the edge pipeline. The modules getting the most messages are the most prone to this condition. Modules that only send never has this issue.
@BertKleewein thanks for the watchdog that monitors the disconnection, I can see that it works, but I think there is still another disconnect/issue at play as well.
Thanks for all the efforts.
AB#7366703
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (4 by maintainers)
Top GitHub Comments
@dschenzer, @tanieee28 , @LouanDuToitS3 , @dme-development I believe I have a fix for this issue. The attached wheel has a test fix which resolves some ‘transport timeout’ issues. I am moving forward with getting a more robust fix into master. In the mean time, I’d appreciate hearing if the attached wheel resolves your issues.
This fix was built from https://github.com/BertKleewein/azure-iot-sdk-python/tree/bertk-release-icm
azure_iot_device-2.1.2_perf_test-py2.py3-none-any.zip
@LouanDuToitS3, @tanieee28, @BertKleewein, @dme-development, @dschenzer, thank you for your contribution to our open-sourced project! Please help us improve by filling out this 2-minute customer satisfaction survey