question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Module Client Reconnect Doesn't Always Work / Hangs Forever when edgeHub Restarts

See original GitHub issue

Context

  • OS and version used: Ubuntu 18.04
  • Python version: 3.7.9
  • pip version: 20.2.4
  • list of installed packages: Package Version

aiohttp 3.7.3 async-timeout 3.0.1 attrs 20.3.0 azure-core 1.9.0 azure-iot-device 2.4.0 azure-storage-blob 12.3.2 certifi 2020.11.8 cffi 1.14.3 chardet 3.0.4 cryptography 3.2.1 deprecation 2.1.0 idna 2.10 isodate 0.6.0 janus 0.4.0 msrest 0.6.19 multidict 5.0.2 oauthlib 3.1.0 packaging 20.4 paho-mqtt 1.5.1 pip 20.2.4 pycparser 2.20 pyparsing 2.4.7 PySocks 1.7.1 python-dateutil 2.8.1 requests 2.25.0 requests-oauthlib 1.3.0 requests-unixsocket 0.2.0 scipy 1.5.2 setuptools 50.3.2 six 1.15.0 typing-extensions 3.7.4.3 urllib3 1.25.11 wheel 0.35.1 yarl 1.6.3

Description of the issue

There appears to be a bug where paho does not always properly reconnect, nor does it appear to properly report connection status. This bug was initially exposed when upgrading the IoT Edge runtime containers, and results in IoTHubModuleClient instances staying in a disconnected state (while reporting as connected). I attempted to handle this issue myself by implementing a coroutine that simply monitors the state of the client’s connection and shuts down the module if the connection reports as disconnected.

Something that perhaps may provide a clue is this issue does not appear to manifest at all in a staging environment that we use and only shows up in our production environment. Apart from slightly more relaxed firewall rules in the staging environment that allows for more pings, ssh, etc. there are no known differences between our staging environment and our production environment.

Code sample exhibiting the issue

In order to reproduce the issue one must have an IoT Edge device with messages being sent consistently to the module below. Once the system is initialized and data is flowing, simply restart edgeHub to observe the reconnection behavior.

Please note that I have not yet had a chance to test the pared down version of the code below.

import asyncio
import concurrent
import logging
import sys
import traceback

from azure.iot.device.exceptions import (
    CredentialError,
    ConnectionDroppedError,
    ClientError,
    ConnectionFailedError,
)
from azure.iot.device.aio import IoTHubModuleClient
from azure.iot.device import Message

logging.basicConfig(level=logging.DEBUG)


async def shutdown(signal, loop):
    """Cleanup tasks tied to the service's shutdown."""
    print("Received exit signal %s..." % (signal))
    print("Clearing out messages in message buffer")
    tasks = [t for t in asyncio.all_tasks() if t is not asyncio.current_task()]
    [task.cancel() for task in tasks]
    print("Cancelling outstanding tasks")
    await asyncio.gather(*tasks, return_exceptions=True)


async def setup_shutdown():
    """
    Adds the shutdown coroutine as a signal handler in the event loop
    for SIGHUP, SIGTERM, and SIGINT signals.
    """
    if sys.platform.startswith("linux"):
        loop = asyncio.get_running_loop()
        # May want to catch other signals too
        signals = (signal.SIGTERM, signal.SIGINT)
        for s in signals:
            loop.add_signal_handler(
                s, lambda s=s: asyncio.create_task(shutdown(s, loop))
            )


async def monitor_client_connection(client: IoTHubModuleClient):
    """"""
    while True:
        if not client.connected:
            raise asyncio.CancelledError
        else:
            print("INFO: IoT Edge module client is connected.")
        await asyncio.sleep(60)


def init_module_client() -> IoTHubModuleClient:
    """Wrapper for IoTHubModuleClient.create_from_edge_environment()

    Used for testing purposes.

    Returns:
        The result of the azure-iot-device API call."""
    return IoTHubModuleClient.create_from_edge_environment()


async def connect_module_client(module_client: IoTHubModuleClient):
    """
    Connects a Module Client to the IoT Edge runtime's Edge Hub.
    This coroutine is interruptible, but will otherwise attempt
    to connect forever until successful.

    Returns:
        None.
    """
    connected = False
    while not connected:
        try:
            await module_client.connect()
            connected = True
        except CredentialError:
            traceback.print_exc()
            raise
        except (ConnectionFailedError, ConnectionDroppedError):
            traceback.print_exc()
            print("Attempting to connect again")
        except ClientError:
            traceback.print_exc()
            raise
        await asyncio.sleep(1)


async def main():
    """"""
    connected = False
    try:
        # Add the shutdown signal handler to the event loop
        await setup_shutdown()
        module_client = init_module_client()
        await connect_module_client(module_client)
        connected = True

        async def message_handler(message: Message):
            print("Received a message from edgeHub: %s" % (message.data))
        # set the message handler on the client
        module_client.on_message_received = message_handler
        # Schedule listeners for different data streams
        listeners = asyncio.gather(
            monitor_client_connection(module_client),
        )
        print("The sample is now waiting for messages.")
        await listeners
    except (asyncio.CancelledError, concurrent.futures.CancelledError):
        if connected:
            await module_client.disconnect()
            print("Disconnected module client.")
        print("Caught cancellation in main coroutine gracefully exiting.")


if __name__ == "__main__":
    asyncio.run(main())

Console log of the issue

I have recorded DEBUG level logs from both our staging environment (where the issue doesn’t manifest) and a production environment (where we experience the issue). Please note that the logs are the same, line for line, until line 109 of each of the log outputs below.

Staging (non-issue)

INFO:azure.iot.device.common.mqtt_transport:message received on devices/MTC01/modules/AnomalyChecker/inputs/telemetry/%24.cdid=MTC01&%24.cmid=IoTEdgeASA&%24.ce=utf-8&%24.ct=application%2Fjson
DEBUG:azure.iot.device.common.pipeline.pipeline_thread:Starting _on_mqtt_message_received in pipeline thread
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_mqtt:MQTTTransportStage: message received on topic devices/MTC01/modules/AnomalyChecker/inputs/telemetry/%24.cdid=MTC01&%24.cmid=IoTEdgeASA&%24.ce=utf-8&%24.ct=application%2Fjson
DEBUG:azure.iot.device.common.pipeline.pipeline_thread:Starting _on_pipeline_event in callback thread
DEBUG:azure.iot.device.iothub.aio.async_handler_manager:HANDLER RUNNER (_on_message_received): Invoking handler
DEBUG:azure.iot.device.iothub.aio.async_handler_manager:HANDLER (_on_message_received): Successfully completed invocation
DEBUG:paho:Received PUBLISH (d0, q1, r0, m1), 'devices/MTC01/modules/AnomalyChecker/inputs/telemetry/%24.cdid=MTC01&%24.cmid=IoTEdgeASA&%24.ce=utf-8&%24.ct=application%2Fjson', ...  (2982 bytes)
DEBUG:paho:Sending PUBACK (Mid: 1)
INFO:azure.iot.device.common.mqtt_transport:message received on devices/MTC01/modules/AnomalyChecker/inputs/telemetry/%24.cdid=MTC01&%24.cmid=IoTEdgeASA&%24.ce=utf-8&%24.ct=application%2Fjson
DEBUG:azure.iot.device.common.pipeline.pipeline_thread:Starting _on_mqtt_message_received in pipeline thread
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_mqtt:MQTTTransportStage: message received on topic devices/MTC01/modules/AnomalyChecker/inputs/telemetry/%24.cdid=MTC01&%24.cmid=IoTEdgeASA&%24.ce=utf-8&%24.ct=application%2Fjson
DEBUG:azure.iot.device.common.pipeline.pipeline_thread:Starting _on_pipeline_event in callback thread
DEBUG:azure.iot.device.iothub.aio.async_handler_manager:HANDLER RUNNER (_on_message_received): Invoking handler
DEBUG:azure.iot.device.iothub.aio.async_handler_manager:HANDLER (_on_message_received): Successfully completed invocation
INFO:azure.iot.device.common.mqtt_transport:disconnected with result code: 1
DEBUG:azure.iot.device.common.mqtt_transport:  File "/usr/local/lib/python3.7/threading.py", line 890, in _bootstrap
    self._bootstrap_inner()
  File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 3452, in _thread_main
    self.loop_forever(retry_first_connection=True)
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 1779, in loop_forever
    rc = self.loop(timeout, max_packets)
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 1181, in loop
    rc = self.loop_read(max_packets)
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 1574, in loop_read
    return self._loop_rc_handle(rc)
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 2227, in _loop_rc_handle
    self._do_on_disconnect(rc, properties)
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 3360, in _do_on_disconnect
    self.on_disconnect(self, self._userdata, rc)
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/mqtt_transport.py", line 210, in on_disconnect
    logger.debug("".join(traceback.format_stack()))

INFO:azure.iot.device.common.mqtt_transport:Forcing paho disconnect to prevent it from automatically reconnecting
DEBUG:azure.iot.device.common.mqtt_transport:in paho thread.  nulling _thread
DEBUG:azure.iot.device.common.mqtt_transport:Done forcing paho disconnect
DEBUG:azure.iot.device.common.pipeline.pipeline_thread:Starting _on_mqtt_disconnected in pipeline thread
INFO:azure.iot.device.common.pipeline.pipeline_stages_mqtt:MQTTTransportStage: _on_mqtt_disconnect called: ConnectionDroppedError('Paho returned rc==1')
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ReconnectStage(DisconnectedEvent): State is LOGICALLY_CONNECTED Connected is True.
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ReconnectStage: State is WAITING_TO_RECONNECT. Connected=True Starting reconnect timer
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:PipelineRootStage: DisconnectedEvent received. Calling on_disconnected_handler
DEBUG:azure.iot.device.common.pipeline.pipeline_thread:Starting _on_disconnected in callback thread
INFO:azure.iot.device.common.pipeline.pipeline_stages_mqtt:MQTTTransportStage: disconnection was unexpected
INFO:azure.iot.device.common.handle_exceptions:Unexpected disconnection.  Safe to ignore since other stages will reconnect.
INFO:azure.iot.device.iothub.abstract_clients:Connection State - Disconnected
INFO:azure.iot.device.iothub.abstract_clients:Cleared all pending method requests due to disconnect
INFO:azure.iot.device.common.handle_exceptions:azure.iot.device.common.transport_exceptions.ConnectionDroppedError: ConnectionDroppedError('Paho returned rc==1')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/handle_exceptions.py", line 43, in swallow_unraised_exception
    raise e
azure.iot.device.common.transport_exceptions.ConnectionDroppedError: ConnectionDroppedError(None) caused by ConnectionDroppedError('Paho returned rc==1')

DEBUG:azure.iot.device.common.pipeline.pipeline_thread:Starting on_reconnect_timer_expired in pipeline thread
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ReconnectStage: Reconnect timer expired. State is WAITING_TO_RECONNECT Connected is False.
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ReconnectStage: sending new connect op down
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ConnectionLockStage(ConnectOperation): blocking
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_mqtt:MQTTTransportStage(ConnectOperation): connecting
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_mqtt:MQTTTransportStage(ConnectOperation): Starting watchdog
DEBUG:azure.iot.device.common.mqtt_transport:connecting to mqtt broker
INFO:azure.iot.device.common.mqtt_transport:Connect using port 8883 (TCP)
INFO:azure.iot.device.common.mqtt_transport:Forcing paho disconnect to prevent it from automatically reconnecting
DEBUG:azure.iot.device.common.mqtt_transport:Done forcing paho disconnect
INFO:azure.iot.device.common.pipeline.pipeline_stages_mqtt:transport.connect raised error
INFO:azure.iot.device.common.pipeline.pipeline_stages_mqtt:Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/mqtt_transport.py", line 387, in connect
    host=self._hostname, port=8883, keepalive=self._keep_alive
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 941, in connect
    return self.reconnect()
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 1075, in reconnect
    sock = self._create_socket_connection()
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 3546, in _create_socket_connection
    return socket.create_connection(addr, source_address=source, timeout=self._keepalive)
  File "/usr/local/lib/python3.7/socket.py", line 728, in create_connection
    raise err
  File "/usr/local/lib/python3.7/socket.py", line 716, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/pipeline/pipeline_stages_mqtt.py", line 183, in _run_op
    self.transport.connect(password=password)
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/mqtt_transport.py", line 409, in connect
    raise exceptions.ConnectionFailedError(cause=e)
azure.iot.device.common.transport_exceptions.ConnectionFailedError: ConnectionFailedError(None) caused by ConnectionRefusedError(111, 'Connection refused')

DEBUG:azure.iot.device.common.pipeline.pipeline_stages_mqtt:MQTTTransportStage(ConnectOperation): cancelling watchdog
DEBUG:azure.iot.device.common.pipeline.pipeline_ops_base:ConnectOperation: completing with error ConnectionFailedError(None) caused by ConnectionRefusedError(111, 'Connection refused')
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ConnectionLockStage(ConnectOperation): op failed.  Unblocking queue with error: ConnectionFailedError(None) caused by ConnectionRefusedError(111, 'Connection refused')
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ConnectionLockStage(ConnectOperation): unblocking and releasing queued ops.
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ConnectionLockStage(ConnectOperation): processing 0 items in queue for error=ConnectionFailedError(None) caused by ConnectionRefusedError(111, 'Connection refused')
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ReconnectStage(ConnectOperation): on_connect_complete error=ConnectionFailedError(None) caused by ConnectionRefusedError(111, 'Connection refused') state=LOGICALLY_CONNECTED never_connected=False connected=False
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ReconnectStage: State is WAITING_TO_RECONNECT. Connected=False Starting reconnect timer
DEBUG:azure.iot.device.common.pipeline.pipeline_thread:Starting on_reconnect_timer_expired in pipeline thread
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ReconnectStage: Reconnect timer expired. State is WAITING_TO_RECONNECT Connected is False.
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ReconnectStage: sending new connect op down
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ConnectionLockStage(ConnectOperation): blocking
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_mqtt:MQTTTransportStage(ConnectOperation): connecting
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_mqtt:MQTTTransportStage(ConnectOperation): Starting watchdog
DEBUG:azure.iot.device.common.mqtt_transport:connecting to mqtt broker
INFO:azure.iot.device.common.mqtt_transport:Connect using port 8883 (TCP)
INFO:azure.iot.device.common.mqtt_transport:Forcing paho disconnect to prevent it from automatically reconnecting
DEBUG:azure.iot.device.common.mqtt_transport:Done forcing paho disconnect
INFO:azure.iot.device.common.pipeline.pipeline_stages_mqtt:transport.connect raised error
INFO:azure.iot.device.common.pipeline.pipeline_stages_mqtt:Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/mqtt_transport.py", line 387, in connect
    host=self._hostname, port=8883, keepalive=self._keep_alive
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 941, in connect
    return self.reconnect()
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 1075, in reconnect
    sock = self._create_socket_connection()
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 3546, in _create_socket_connection
    return socket.create_connection(addr, source_address=source, timeout=self._keepalive)
  File "/usr/local/lib/python3.7/socket.py", line 728, in create_connection
    raise err
  File "/usr/local/lib/python3.7/socket.py", line 716, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/pipeline/pipeline_stages_mqtt.py", line 183, in _run_op
    self.transport.connect(password=password)
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/mqtt_transport.py", line 409, in connect
    raise exceptions.ConnectionFailedError(cause=e)
azure.iot.device.common.transport_exceptions.ConnectionFailedError: ConnectionFailedError(None) caused by ConnectionRefusedError(111, 'Connection refused')

DEBUG:azure.iot.device.common.pipeline.pipeline_stages_mqtt:MQTTTransportStage(ConnectOperation): cancelling watchdog
DEBUG:azure.iot.device.common.pipeline.pipeline_ops_base:ConnectOperation: completing with error ConnectionFailedError(None) caused by ConnectionRefusedError(111, 'Connection refused')
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ConnectionLockStage(ConnectOperation): op failed.  Unblocking queue with error: ConnectionFailedError(None) caused by ConnectionRefusedError(111, 'Connection refused')
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ConnectionLockStage(ConnectOperation): unblocking and releasing queued ops.
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ConnectionLockStage(ConnectOperation): processing 0 items in queue for error=ConnectionFailedError(None) caused by ConnectionRefusedError(111, 'Connection refused')
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ReconnectStage(ConnectOperation): on_connect_complete error=ConnectionFailedError(None) caused by ConnectionRefusedError(111, 'Connection refused') state=LOGICALLY_CONNECTED never_connected=False connected=False
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ReconnectStage: State is WAITING_TO_RECONNECT. Connected=False Starting reconnect timer
DEBUG:azure.iot.device.common.pipeline.pipeline_thread:Starting on_reconnect_timer_expired in pipeline thread
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ReconnectStage: Reconnect timer expired. State is WAITING_TO_RECONNECT Connected is False.
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ReconnectStage: sending new connect op down
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ConnectionLockStage(ConnectOperation): blocking
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_mqtt:MQTTTransportStage(ConnectOperation): connecting
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_mqtt:MQTTTransportStage(ConnectOperation): Starting watchdog
DEBUG:azure.iot.device.common.mqtt_transport:connecting to mqtt broker
INFO:azure.iot.device.common.mqtt_transport:Connect using port 8883 (TCP)
INFO:azure.iot.device.common.mqtt_transport:Forcing paho disconnect to prevent it from automatically reconnecting
DEBUG:azure.iot.device.common.mqtt_transport:Done forcing paho disconnect
INFO:azure.iot.device.common.pipeline.pipeline_stages_mqtt:transport.connect raised error
INFO:azure.iot.device.common.pipeline.pipeline_stages_mqtt:Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/mqtt_transport.py", line 387, in connect
    host=self._hostname, port=8883, keepalive=self._keep_alive
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 941, in connect
    return self.reconnect()
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 1075, in reconnect
    sock = self._create_socket_connection()
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 3546, in _create_socket_connection
    return socket.create_connection(addr, source_address=source, timeout=self._keepalive)
  File "/usr/local/lib/python3.7/socket.py", line 728, in create_connection
    raise err
  File "/usr/local/lib/python3.7/socket.py", line 716, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/pipeline/pipeline_stages_mqtt.py", line 183, in _run_op
    self.transport.connect(password=password)
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/mqtt_transport.py", line 409, in connect
    raise exceptions.ConnectionFailedError(cause=e)
azure.iot.device.common.transport_exceptions.ConnectionFailedError: ConnectionFailedError(None) caused by ConnectionRefusedError(111, 'Connection refused')

DEBUG:azure.iot.device.common.pipeline.pipeline_stages_mqtt:MQTTTransportStage(ConnectOperation): cancelling watchdog
DEBUG:azure.iot.device.common.pipeline.pipeline_ops_base:ConnectOperation: completing with error ConnectionFailedError(None) caused by ConnectionRefusedError(111, 'Connection refused')
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ConnectionLockStage(ConnectOperation): op failed.  Unblocking queue with error: ConnectionFailedError(None) caused by ConnectionRefusedError(111, 'Connection refused')
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ConnectionLockStage(ConnectOperation): unblocking and releasing queued ops.
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ConnectionLockStage(ConnectOperation): processing 0 items in queue for error=ConnectionFailedError(None) caused by ConnectionRefusedError(111, 'Connection refused')
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ReconnectStage(ConnectOperation): on_connect_complete error=ConnectionFailedError(None) caused by ConnectionRefusedError(111, 'Connection refused') state=LOGICALLY_CONNECTED never_connected=False connected=False
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ReconnectStage: State is WAITING_TO_RECONNECT. Connected=False Starting reconnect timer
INFO:azure.iot.device.iothub.aio.async_clients:Disconnecting from Hub...
DEBUG:azure.iot.device.iothub.aio.async_clients:Executing initial disconnect
DEBUG:azure.iot.device.iothub.pipeline.mqtt_pipeline:Starting DisconnectOperation on the pipeline
DEBUG:azure.iot.device.common.pipeline.pipeline_thread:Starting run_op in pipeline thread
INFO:azure.iot.device.common.pipeline.pipeline_stages_base:ReconnectStage(DisconnectOperation): State changes WAITING_TO_RECONNECT->LOGICALLY_DISCONNECTED.  Canceling waiting ops 
and sending disconnect down.
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ReconnectStage: clearing reconnect timer
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ReconnectStage: completing waiting ops with error=OperationCancelled('Explicit disconnect invoked')
DEBUG:azure.iot.device.common.pipeline.pipeline_ops_base:DisconnectOperation: completing without error
DEBUG:azure.iot.device.common.pipeline.pipeline_thread:Starting on_complete in callback thread
DEBUG:azure.iot.device.common.async_adapter:Callback completed with result None
DEBUG:azure.iot.device.iothub.aio.async_clients:Successfully executed initial disconnect
DEBUG:azure.iot.device.iothub.aio.async_clients:Stopping handlers...
DEBUG:azure.iot.device.iothub.aio.async_handler_manager:Adding HandlerRunnerKillerSentinel to inbox corresponding to _on_message_received handler runner
DEBUG:azure.iot.device.iothub.aio.async_handler_manager:Waiting for _on_message_received handler runner to exit...
DEBUG:azure.iot.device.iothub.aio.async_handler_manager:HANDLER RUNNER (_on_message_received): HandlerRunnerKillerSentinel found in inbox. Exiting.
DEBUG:azure.iot.device.iothub.aio.async_handler_manager:HANDLER RUNNER (_on_message_received): Task successfully completed without exception
DEBUG:azure.iot.device.iothub.aio.async_handler_manager:Handler runner for _on_message_received has been stopped
DEBUG:azure.iot.device.iothub.aio.async_handler_manager:Adding HandlerRunnerKillerSentinel to inbox corresponding to _on_method_request_received handler runner
DEBUG:azure.iot.device.iothub.aio.async_handler_manager:Waiting for _on_method_request_received handler runner to exit...
DEBUG:azure.iot.device.iothub.aio.async_handler_manager:HANDLER RUNNER (_on_method_request_received): HandlerRunnerKillerSentinel found in inbox. Exiting.
DEBUG:azure.iot.device.iothub.aio.async_handler_manager:HANDLER RUNNER (_on_method_request_received): Task successfully completed without exception
DEBUG:azure.iot.device.iothub.aio.async_handler_manager:Handler runner for _on_method_request_received has been stopped
DEBUG:azure.iot.device.iothub.aio.async_clients:Successfully stopped handlers
DEBUG:azure.iot.device.iothub.aio.async_clients:Executing secondary disconnect...
DEBUG:azure.iot.device.iothub.pipeline.mqtt_pipeline:Starting DisconnectOperation on the pipeline
DEBUG:azure.iot.device.common.pipeline.pipeline_thread:Starting run_op in pipeline thread
INFO:azure.iot.device.common.pipeline.pipeline_stages_base:ReconnectStage(DisconnectOperation): State changes LOGICALLY_DISCONNECTED->LOGICALLY_DISCONNECTED.  Sending op down.    
INFO:azure.iot.device.common.pipeline.pipeline_stages_base:ConnectionLockStage(DisconnectOperation): Transport is already disconnected.  Completing.
DEBUG:azure.iot.device.common.pipeline.pipeline_ops_base:DisconnectOperation: completing without error
DEBUG:azure.iot.device.common.pipeline.pipeline_thread:Starting on_complete in callback thread
DEBUG:azure.iot.device.common.async_adapter:Callback completed with result None
DEBUG:azure.iot.device.iothub.aio.async_clients:Successfully executed secondary disconnect
INFO:azure.iot.device.iothub.aio.async_clients:Successfully disconnected from Hub
Disconnected module client.
Caught cancellation in main coroutine gracefully exiting.
Cancelled. Starting graceful shutdown procedure.

Production (issue)

INFO:azure.iot.device.common.mqtt_transport:message received on devices/MTC0889/modules/AnomalyChecker/inputs/telemetry/%24.cdid=MTC0889&%24.cmid=IoTEdgeASA&%24.ce=utf-8&%24.ct=application%2Fjson
DEBUG:azure.iot.device.common.pipeline.pipeline_thread:Starting _on_mqtt_message_received in pipeline thread
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_mqtt:MQTTTransportStage: message received on topic devices/MTC0889/modules/AnomalyChecker/inputs/telemetry/%24.cdid=MTC0889&%24.cmid=IoTEdgeASA&%24.ce=utf-8&%24.ct=application%2Fjson
DEBUG:azure.iot.device.common.pipeline.pipeline_thread:Starting _on_pipeline_event in callback thread
DEBUG:azure.iot.device.iothub.aio.async_handler_manager:HANDLER RUNNER (_on_message_received): Invoking handler
DEBUG:azure.iot.device.iothub.aio.async_handler_manager:HANDLER (_on_message_received): Successfully completed invocation
INFO:azure.iot.device.common.mqtt_transport:disconnected with result code: 1
DEBUG:azure.iot.device.common.mqtt_transport:  File "/usr/local/lib/python3.7/threading.py", line 890, in _bootstrap
    self._bootstrap_inner()
  File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 3452, in _thread_main
    self.loop_forever(retry_first_connection=True)
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 1779, in loop_forever
    rc = self.loop(timeout, max_packets)
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 1181, in loop
    rc = self.loop_read(max_packets)
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 1574, in loop_read
    return self._loop_rc_handle(rc)
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 2227, in _loop_rc_handle
    self._do_on_disconnect(rc, properties)
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 3360, in _do_on_disconnect
    self.on_disconnect(self, self._userdata, rc)
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/mqtt_transport.py", line 210, in on_disconnect
    logger.debug("".join(traceback.format_stack()))

INFO:azure.iot.device.common.mqtt_transport:Forcing paho disconnect to prevent it from automatically reconnecting
DEBUG:azure.iot.device.common.mqtt_transport:in paho thread.  nulling _thread
DEBUG:azure.iot.device.common.mqtt_transport:Done forcing paho disconnect
DEBUG:azure.iot.device.common.pipeline.pipeline_thread:Starting _on_mqtt_disconnected in pipeline thread
INFO:azure.iot.device.common.pipeline.pipeline_stages_mqtt:MQTTTransportStage: _on_mqtt_disconnect called: ConnectionDroppedError('Paho returned rc==1')
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ReconnectStage(DisconnectedEvent): State is LOGICALLY_CONNECTED Connected is True.
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ReconnectStage: State is WAITING_TO_RECONNECT. Connected=True Starting reconnect timer
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:PipelineRootStage: DisconnectedEvent received. Calling on_disconnected_handler
DEBUG:azure.iot.device.common.pipeline.pipeline_thread:Starting _on_disconnected in callback thread
INFO:azure.iot.device.common.pipeline.pipeline_stages_mqtt:MQTTTransportStage: disconnection was unexpected
INFO:azure.iot.device.common.handle_exceptions:Unexpected disconnection.  Safe to ignore since other stages will reconnect.
INFO:azure.iot.device.common.handle_exceptions:azure.iot.device.common.transport_exceptions.ConnectionDroppedError: ConnectionDroppedError('Paho returned rc==1')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/handle_exceptions.py", line 43, in swallow_unraised_exception
    raise e
azure.iot.device.common.transport_exceptions.ConnectionDroppedError: ConnectionDroppedError(None) caused by ConnectionDroppedError('Paho returned rc==1')

INFO:azure.iot.device.iothub.abstract_clients:Connection State - Disconnected
INFO:azure.iot.device.iothub.abstract_clients:Cleared all pending method requests due to disconnect
DEBUG:azure.iot.device.common.pipeline.pipeline_thread:Starting on_reconnect_timer_expired in pipeline thread
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ReconnectStage: Reconnect timer expired. State is WAITING_TO_RECONNECT Connected is False.
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ReconnectStage: sending new connect op down
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ConnectionLockStage(ConnectOperation): blocking
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_mqtt:MQTTTransportStage(ConnectOperation): connecting
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_mqtt:MQTTTransportStage(ConnectOperation): Starting watchdog
DEBUG:azure.iot.device.common.mqtt_transport:connecting to mqtt broker
INFO:azure.iot.device.common.mqtt_transport:Connect using port 8883 (TCP)
INFO:azure.iot.device.common.mqtt_transport:Forcing paho disconnect to prevent it from automatically reconnecting
DEBUG:azure.iot.device.common.mqtt_transport:Done forcing paho disconnect
INFO:azure.iot.device.common.pipeline.pipeline_stages_mqtt:transport.connect raised error
INFO:azure.iot.device.common.pipeline.pipeline_stages_mqtt:Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/mqtt_transport.py", line 387, in connect
    host=self._hostname, port=8883, keepalive=self._keep_alive
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 941, in connect
    return self.reconnect()
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 1075, in reconnect
    sock = self._create_socket_connection()
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 3546, in _create_socket_connection
    return socket.create_connection(addr, source_address=source, timeout=self._keepalive)
  File "/usr/local/lib/python3.7/socket.py", line 728, in create_connection
    raise err
  File "/usr/local/lib/python3.7/socket.py", line 716, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/pipeline/pipeline_stages_mqtt.py", line 183, in _run_op
    self.transport.connect(password=password)
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/mqtt_transport.py", line 409, in connect
    raise exceptions.ConnectionFailedError(cause=e)
azure.iot.device.common.transport_exceptions.ConnectionFailedError: ConnectionFailedError(None) caused by ConnectionRefusedError(111, 'Connection refused')

DEBUG:azure.iot.device.common.pipeline.pipeline_stages_mqtt:MQTTTransportStage(ConnectOperation): cancelling watchdog
DEBUG:azure.iot.device.common.pipeline.pipeline_ops_base:ConnectOperation: completing with error ConnectionFailedError(None) caused by ConnectionRefusedError(111, 'Connection refused')
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ConnectionLockStage(ConnectOperation): op failed.  Unblocking queue with error: ConnectionFailedError(None) caused by ConnectionRefusedError(111, 'Connection refused')
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ConnectionLockStage(ConnectOperation): unblocking and releasing queued ops.
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ConnectionLockStage(ConnectOperation): processing 0 items in queue for error=ConnectionFailedError(None) caused by ConnectionRefusedError(111, 'Connection refused')
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ReconnectStage(ConnectOperation): on_connect_complete error=ConnectionFailedError(None) caused by ConnectionRefusedError(111, 'Connection refused') state=LOGICALLY_CONNECTED never_connected=False connected=False
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ReconnectStage: State is WAITING_TO_RECONNECT. Connected=False Starting reconnect timer
DEBUG:azure.iot.device.common.pipeline.pipeline_thread:Starting on_reconnect_timer_expired in pipeline thread
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ReconnectStage: Reconnect timer expired. State is WAITING_TO_RECONNECT Connected is False.
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ReconnectStage: sending new connect op down
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ConnectionLockStage(ConnectOperation): blocking
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_mqtt:MQTTTransportStage(ConnectOperation): connecting
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_mqtt:MQTTTransportStage(ConnectOperation): Starting watchdog
DEBUG:azure.iot.device.common.mqtt_transport:connecting to mqtt broker
INFO:azure.iot.device.common.mqtt_transport:Connect using port 8883 (TCP)
DEBUG:paho:Sending CONNECT (u1, p1, wr0, wq0, wf0, c0, k60) client_id=b'MTC0889/AnomalyChecker'
DEBUG:azure.iot.device.common.mqtt_transport:_mqtt_client.connect returned rc=0
DEBUG:paho:Received CONNACK (1, 0)
INFO:azure.iot.device.common.mqtt_transport:connected with result code: 0
DEBUG:azure.iot.device.common.pipeline.pipeline_thread:Starting _on_mqtt_connected in pipeline thread
INFO:azure.iot.device.common.pipeline.pipeline_stages_mqtt:_on_mqtt_connected called
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:PipelineRootStage: ConnectedEvent received. Calling on_connected_handler
DEBUG:azure.iot.device.common.pipeline.pipeline_thread:Starting _on_connected in callback thread
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_mqtt:completing connect op
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_mqtt:MQTTTransportStage(ConnectOperation): cancelling watchdog
DEBUG:azure.iot.device.common.pipeline.pipeline_ops_base:ConnectOperation: completing without error
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ConnectionLockStage(ConnectOperation): op succeeded.  Unblocking queue
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ConnectionLockStage(ConnectOperation): unblocking and releasing queued ops.
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ConnectionLockStage(ConnectOperation): processing 0 items in queue for error=None
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ReconnectStage(ConnectOperation): on_connect_complete error=None state=LOGICALLY_CONNECTED never_connected=False connected=True
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ReconnectStage: completing waiting ops with error=None
INFO:azure.iot.device.iothub.abstract_clients:Connection State - Connected
INFO: IoT Edge module client is connected.

AB#8847831

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:2
  • Comments:19 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
elhortoncommented, Jan 15, 2021

We are closing this issue because it is outside the scope of the SDK and is being tracked appropriately in the iotedge repo. Thank you for filing and helping us improve Azure IoT as a whole!

1reaction
BertKleeweincommented, Dec 4, 2020

@jackt-moran - this is definitely a bug in edge caused by edgeHub losing the subscription when it restarts. Opening a bug in the iotedge repo is the next step at this point. Once you open it, I’ll add what I know.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshoot Azure IoT Edge common errors - Microsoft Learn
Provisioning and Deployment. IoT Edge module deploys successfully then disappears from device. Symptoms. After setting modules for an IoT Edge ...
Read more >
BUG: Reconnect doesnt work anymore inside the client since ...
BUG: Reconnect doesnt work anymore inside the client since patch. I have to restart the game every time I disconnect after a BG...
Read more >
IoT Edge Module stops sending data - Stack Overflow
It works fine, but stops sending data after some time, even though the module is still active and enabled. The logs look fine,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found