question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Module fails to restart because of transient network error

See original GitHub issue

Expected Behavior

When an IoT Edge module disconnects it should be able to reconnect.

Current Behavior

I have an IoT Edge solution that runs on a Raspberry Pi and consists of 4 IoT Edge modules. Upon deployment of the solution to the Edge Device (Pi), everything was working fine. After a couple of hours, one of the modules was being restarted by the IoT Edge runtime, and failed to start again. All the other modules just keep on working.

When I look in the logs of the failing module, I can see that an exception is being thrown with the following stacktrace:

Unhandled Exception: Microsoft.Azure.Devices.Client.Exceptions.IotHubCommunicationException: Transient network error occurred, please retry. ---> System.Net.Internals.SocketExceptionFactory+ExtendedSocketException: No such device or address
   at System.Net.Dns.InternalGetHostByName(String hostName)
   at System.Net.Dns.ResolveCallback(Object context)
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw(Exception source)
   at System.Net.Dns.HostResolutionEndHelper(IAsyncResult asyncResult)
   at System.Net.Dns.EndGetHostAddresses(IAsyncResult asyncResult)
   at System.Net.Dns.<>c.<GetHostAddressesAsync>b__25_1(IAsyncResult asyncResult)
   at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.Azure.Devices.Client.Transport.Mqtt.MqttTransportHandler.OpenAsyncInternal(CancellationToken cancellationToken)
   at Microsoft.Azure.Devices.Client.Transport.Mqtt.MqttTransportHandler.OpenAsync(CancellationToken cancellationToken)
   at Microsoft.Azure.Devices.Client.Transport.ProtocolRoutingDelegatingHandler.OpenAsync(CancellationToken cancellationToken)
   at Microsoft.Azure.Devices.Client.Transport.ErrorDelegatingHandler.<>c__DisplayClass22_0.<<ExecuteWithErrorHandlingAsync>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.Azure.Devices.Client.Transport.ErrorDelegatingHandler.ExecuteWithErrorHandlingAsync[T](Func`1 asyncOperation)
   --- End of inner exception stack trace ---
   at Microsoft.Azure.Devices.Client.Transport.ErrorDelegatingHandler.ExecuteWithErrorHandlingAsync[T](Func`1 asyncOperation)
   at Microsoft.Azure.Devices.Client.Transport.RetryDelegatingHandler.<>c__DisplayClass32_0.<<OpenAsyncInternal>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.Azure.Devices.Client.Transport.RetryDelegatingHandler.EnsureOpenedAsync(CancellationToken cancellationToken)
   at Microsoft.Azure.Devices.Client.InternalClient.OpenAsync()
   at DiscoveryModule.Program.Init() in /app/Program.cs:line 210
   at DiscoveryModule.Program.Main(String[] args) in /app/Program.cs:line 25
   at DiscoveryModule.Program.<Main>(String[] args)

This exception occurs when the OpenAsync method on the ModuleClient is being called to open the connection to IoT Hub.

The module is using MQTT as a communication protocol; this is how the connection is being established:

var mqttSetting = new MqttTransportSettings(TransportType.Mqtt_Tcp_Only);
ITransportSettings[] settings = { mqttSetting };

// Open a connection to the Edge runtime
ModuleClient ioTHubModuleClient = await ModuleClient.CreateFromEnvironmentAsync(settings);
await ioTHubModuleClient.OpenAsync();

var currentTwin = await ioTHubModuleClient.GetTwinAsync();

My other modules that are running on the same device (and that keep working), are also using MQTT.

I’m using Microsoft.Azure.Devices.Client 1.20.3 in all my Edge Modules (both on the one that’s failing and in the other ones that are still working).

I also tried to restart the module manually via iotedge restart <modulename>, but this didn’t help. I assume that the problem will be solved (temporarely?) when I reboot the device, but this is not something that can be done when the device is ‘in the field’.

The module that is failing has the following ‘createOptions’:

"createOptions": {
                "HostConfig": {
                  "NetworkMode": "host"
                },
                "NetworkingConfig": {
                  "EndpointsConfig": {
                    "host": {}
                  }
                }
              }

(However, this module is also running on another device and this issue doesn’t exists on the other device).

Context (Environment)

Output of iotedge check

Click here
Configuration checks
--------------------
√ config.yaml is well-formed - OK
√ config.yaml has well-formed connection string - OK
√ container engine is installed and functional - OK
√ config.yaml has correct hostname - OK
√ config.yaml has correct URIs for daemon mgmt endpoint - OK
√ latest security daemon - OK
√ host time is close to real time - OK
√ container time is close to host time - OK
‼ DNS server - Warning
    Container engine is not configured with DNS server setting, which may impact connectivity to IoT Hub.
    Please see https://aka.ms/iotedge-prod-checklist-dns for best practices.
    You can ignore this warning if you are setting DNS server per module in the Edge deployment.
‼ production readiness: certificates - Warning
    Device is using self-signed, automatically generated certs.
    Please see https://aka.ms/iotedge-prod-checklist-certs for best practices.
√ production readiness: certificates expiry - OK
√ production readiness: container engine - OK
‼ production readiness: logs policy - Warning
    Container engine is not configured to rotate module logs which may cause it run out of disk space.
    Please see https://aka.ms/iotedge-prod-checklist-logs for best practices.
    You can ignore this warning if you are setting log policy per module in the Edge deployment.

Connectivity checks
-------------------
√ host can connect to and perform TLS handshake with IoT Hub AMQP port - OK
√ host can connect to and perform TLS handshake with IoT Hub HTTPS / WebSockets port - OK
√ host can connect to and perform TLS handshake with IoT Hub MQTT port - OK
√ container on the default network can connect to IoT Hub AMQP port - OK
√ container on the default network can connect to IoT Hub HTTPS / WebSockets port - OK
√ container on the default network can connect to IoT Hub MQTT port - OK
√ container on the IoT Edge module network can connect to IoT Hub AMQP port - OK
√ container on the IoT Edge module network can connect to IoT Hub HTTPS / WebSockets port - OK
√ container on the IoT Edge module network can connect to IoT Hub MQTT port - OK
√ Edge Hub can bind to ports on host - OK

20 check(s) succeeded.
3 check(s) raised warnings. Re-run with --verbose for more details.

Device Information

  • Host OS : Raspbian gnu/linux 9 (stretch)
  • Architecture: ARM32
  • Container OS: Linux containers

Runtime Versions

  • iotedged [run iotedge version]: iotedge 1.0.8 (208b2204fd30e856d00b280112422130c104b9f0)
  • Edge Agent [image tag (e.g. 1.0.0)]: 1.0.8.3
  • Edge Hub [image tag (e.g. 1.0.0)]: 1.0.8.3
  • Docker/Moby [run docker version]: 3.0.7

Logs

edge-agent logs
<6> 2019-11-12 16:12:42.456 +00:00 [INF] - Executing command: "Command Group: (\n  [Stop module DiscoveryModule]\n  [Start module DiscoveryModule]\n  [Saving DiscoveryModule to store]\n)"
<6> 2019-11-12 16:12:42.456 +00:00 [INF] - Executing command: "Stop module DiscoveryModule"
<6> 2019-11-12 16:12:42.466 +00:00 [INF] - Executing command: "Start module DiscoveryModule"
<6> 2019-11-12 16:12:43.026 +00:00 [INF] - Executing command: "Saving DiscoveryModule to store"
<6> 2019-11-12 16:12:43.027 +00:00 [INF] - Plan execution ended for deployment 9
<6> 2019-11-12 16:12:48.412 +00:00 [INF] - Updated reported properties
<6> 2019-11-12 16:16:58.822 +00:00 [INF] - Module 'DiscoveryModule' scheduled to restart after 05m:00s (04m:55s left).
<6> 2019-11-12 16:16:58.995 +00:00 [INF] - Updated reported properties
<6> 2019-11-12 16:17:04.077 +00:00 [INF] - Module 'DiscoveryModule' scheduled to restart after 05m:00s (04m:49s left).
<6> 2019-11-12 16:17:09.186 +00:00 [INF] - Module 'DiscoveryModule' scheduled to restart after 05m:00s (04m:44s left).
<6> 2019-11-12 16:17:14.296 +00:00 [INF] - Module 'DiscoveryModule' scheduled to restart after 05m:00s (04m:39s left).
<6> 2019-11-12 16:17:19.408 +00:00 [INF] - Module 'DiscoveryModule' scheduled to restart after 05m:00s (04m:34s left).
<6> 2019-11-12 16:17:24.515 +00:00 [INF] - Module 'DiscoveryModule' scheduled to restart after 05m:00s (04m:29s left).
<6> 2019-11-12 16:17:29.633 +00:00 [INF] - Module 'DiscoveryModule' scheduled to restart after 05m:00s (04m:24s left).
<6> 2019-11-12 16:17:34.741 +00:00 [INF] - Module 'DiscoveryModule' scheduled to restart after 05m:00s (04m:19s left).
<6> 2019-11-12 16:17:39.848 +00:00 [INF] - Module 'DiscoveryModule' scheduled to restart after 05m:00s (04m:13s left).
<6> 2019-11-12 16:17:44.959 +00:00 [INF] - Module 'DiscoveryModule' scheduled to restart after 05m:00s (04m:08s left).
edge-hub logs
<6> 2019-11-12 15:23:53.408 +00:00 [INF] - Credentials for client DG-0000000-191025-001/ActionExecutor are not valid.
<4> 2019-11-12 15:23:53.409 +00:00 [WRN] - Reauthenticating client DG-0000000-191025-001/ActionExecutor failed, removing client connection
<6> 2019-11-12 15:23:53.409 +00:00 [INF] - Unable to re-authenticate DG-0000000-191025-001/ActionExecutor, dropping client connection.
<4> 2019-11-12 15:23:53.409 +00:00 [WRN] - Closing connection for device: DG-0000000-191025-001/ActionExecutor, Microsoft.Azure.Devices.Edge.Hub.Core.EdgeHubConnectionException: Connection closed for device DG-0000000-191025-001/ActionExecutor.,
<6> 2019-11-12 15:23:53.409 +00:00 [INF] - Disposing MessagingServiceClient for device Id DG-0000000-191025-001/ActionExecutor because of exception - Microsoft.Azure.Devices.Edge.Hub.Core.EdgeHubConnectionException: Connection closed for device DG-0000000-191025-001/ActionExecutor.
<6> 2019-11-12 15:23:53.410 +00:00 [INF] - Closing device proxy for device Id DG-0000000-191025-001/ActionExecutor
<6> 2019-11-12 15:23:53.410 +00:00 [INF] - Device connection removed for device DG-0000000-191025-001/ActionExecutor
<6> 2019-11-12 15:23:53.559 +00:00 [INF] - Client DG-0000000-191025-001/ActionExecutor in device scope authenticated locally.
<6> 2019-11-12 15:23:53.560 +00:00 [INF] - Successfully generated identity for clientId DG-0000000-191025-001/ActionExecutor and username dg-0000000-191025-001/DG-0000000-191025-001/ActionExecutor/?api-version=2018-06-30&DeviceClientType=.NET%2F1.20.3%20%28.NET%20Core%204.6.28008.02%3B%20Linux%204.19.66-v7%2B%20%231253%20SMP%20Thu%20Aug%2015%2011%3A49%3A46%20BST%202019%3B%20Arm%29
<6> 2019-11-12 15:23:53.560 +00:00 [INF] - ClientAuthenticated, DG-0000000-191025-001/ActionExecutor, 3bee4180
<6> 2019-11-12 15:23:53.563 +00:00 [INF] - New device connection for device DG-0000000-191025-001/ActionExecutor
<6> 2019-11-12 15:23:53.568 +00:00 [INF] - Bind device proxy for device DG-0000000-191025-001/ActionExecutor
<6> 2019-11-12 15:23:53.568 +00:00 [INF] - Binding message channel for device Id DG-0000000-191025-001/ActionExecutor
<6> 2019-11-12 15:23:53.569 +00:00 [INF] - Processing subscriptions TwinResponse, ModuleMessages, Methods for client DG-0000000-191025-001/ActionExecutor.
<6> 2019-11-12 15:23:53.586 +00:00 [INF] - Set subscriptions from session state for DG-0000000-191025-001/ActionExecutor
<6> 2019-11-12 15:23:53.586 +00:00 [INF] - Updated reported properties for DG-0000000-191025-001/$edgeHub
<6> 2019-11-12 15:23:53.589 +00:00 [INF] - Processing subscriptions TwinResponse, ModuleMessages, Methods for client DG-0000000-191025-001/ActionExecutor.
<6> 2019-11-12 15:23:53.591 +00:00 [INF] - Set subscriptions from session state for DG-0000000-191025-001/ActionExecutor
<6> 2019-11-12 15:23:53.593 +00:00 [INF] - Processing subscriptions TwinResponse, ModuleMessages, Methods for client DG-0000000-191025-001/ActionExecutor.
<6> 2019-11-12 15:23:53.596 +00:00 [INF] - Set subscriptions from session state for DG-0000000-191025-001/ActionExecutor
<6> 2019-11-12 15:23:58.725 +00:00 [INF] - Updated reported properties for DG-0000000-191025-001/$edgeHub
<6> 2019-11-12 15:28:53.416 +00:00 [INF] - Reauthenticating connected clients
<6> 2019-11-12 15:28:53.422 +00:00 [INF] - Error authenticating token for DG-0000000-191025-001/TelemetryRetriever because the token is expired or could not be parsed
<6> 2019-11-12 15:28:53.423 +00:00 [INF] - Credentials for client DG-0000000-191025-001/TelemetryRetriever are not valid.
<4> 2019-11-12 15:28:53.423 +00:00 [WRN] - Reauthenticating client DG-0000000-191025-001/TelemetryRetriever failed, removing client connection
<6> 2019-11-12 15:28:53.424 +00:00 [INF] - Unable to re-authenticate DG-0000000-191025-001/TelemetryRetriever, dropping client connection.
<4> 2019-11-12 15:28:53.424 +00:00 [WRN] - Closing connection for device: DG-0000000-191025-001/TelemetryRetriever, Microsoft.Azure.Devices.Edge.Hub.Core.EdgeHubConnectionException: Connection closed for device DG-0000000-191025-001/TelemetryRetriever.,
<6> 2019-11-12 15:28:53.424 +00:00 [INF] - Disposing MessagingServiceClient for device Id DG-0000000-191025-001/TelemetryRetriever because of exception - Microsoft.Azure.Devices.Edge.Hub.Core.EdgeHubConnectionException: Connection closed for device DG-0000000-191025-001/TelemetryRetriever.
<6> 2019-11-12 15:28:53.431 +00:00 [INF] - Closing device proxy for device Id DG-0000000-191025-001/TelemetryRetriever
<6> 2019-11-12 15:28:53.432 +00:00 [INF] - Device connection removed for device DG-0000000-191025-001/TelemetryRetriever
<6> 2019-11-12 15:28:53.587 +00:00 [INF] - Client DG-0000000-191025-001/TelemetryRetriever in device scope authenticated locally.
<6> 2019-11-12 15:28:53.588 +00:00 [INF] - Successfully generated identity for clientId DG-0000000-191025-001/TelemetryRetriever and username dg-0000000-191025-001/DG-0000000-191025-001/TelemetryRetriever/?api-version=2018-06-30&DeviceClientType=.NET%2F1.20.3%20%28.NET%20Core%204.6.28008.02%3B%20Linux%204.19.66-v7%2B%20%231253%20SMP%20Thu%20Aug%2015%2011%3A49%3A46%20BST%202019%3B%20Arm%29
<6> 2019-11-12 15:28:53.588 +00:00 [INF] - ClientAuthenticated, DG-0000000-191025-001/TelemetryRetriever, 190cc900
<6> 2019-11-12 15:28:53.591 +00:00 [INF] - New device connection for device DG-0000000-191025-001/TelemetryRetriever
<6> 2019-11-12 15:28:53.595 +00:00 [INF] - Bind device proxy for device DG-0000000-191025-001/TelemetryRetriever
<6> 2019-11-12 15:28:53.595 +00:00 [INF] - Binding message channel for device Id DG-0000000-191025-001/TelemetryRetriever
<6> 2019-11-12 15:28:53.597 +00:00 [INF] - Processing subscriptions TwinResponse, ModuleMessages, DesiredPropertyUpdates for client DG-0000000-191025-001/TelemetryRetriever.
<6> 2019-11-12 15:28:53.599 +00:00 [INF] - Set subscriptions from session state for DG-0000000-191025-001/TelemetryRetriever
<6> 2019-11-12 15:28:53.602 +00:00 [INF] - Processing subscriptions TwinResponse, ModuleMessages, DesiredPropertyUpdates for client DG-0000000-191025-001/TelemetryRetriever.
<6> 2019-11-12 15:28:53.604 +00:00 [INF] - Set subscriptions from session state for DG-0000000-191025-001/TelemetryRetriever
<6> 2019-11-12 15:28:53.606 +00:00 [INF] - Updated reported properties for DG-0000000-191025-001/$edgeHub
<6> 2019-11-12 15:28:53.606 +00:00 [INF] - Processing subscriptions TwinResponse, ModuleMessages, DesiredPropertyUpdates for client DG-0000000-191025-001/TelemetryRetriever.
<6> 2019-11-12 15:28:53.608 +00:00 [INF] - Set subscriptions from session state for DG-0000000-191025-001/TelemetryRetriever
<6> 2019-11-12 15:28:58.808 +00:00 [INF] - Updated reported properties for DG-0000000-191025-001/$edgeHub
<6> 2019-11-12 15:33:53.410 +00:00 [INF] - Reauthenticating connected clients
<6> 2019-11-12 15:38:53.407 +00:00 [INF] - Reauthenticating connected clients
<6> 2019-11-12 15:43:53.408 +00:00 [INF] - Reauthenticating connected clients
<6> 2019-11-12 15:44:11.827 +00:00 [INF] - Starting periodic operation Get EdgeHub config...
<6> 2019-11-12 15:44:11.894 +00:00 [INF] - Obtained edge hub config from module twin
<6> 2019-11-12 15:44:11.895 +00:00 [INF] - Successfully completed periodic operation Get EdgeHub config
<6> 2019-11-12 15:44:12.038 +00:00 [INF] - Updated reported properties for DG-0000000-191025-001/$edgeHub
<6> 2019-11-12 15:48:53.407 +00:00 [INF] - Reauthenticating connected clients
<6> 2019-11-12 15:49:54.537 +00:00 [INF] - Started task to cleanup processed and stale messages for endpoint iothub
<6> 2019-11-12 15:49:54.539 +00:00 [INF] - Cleaned up 0 messages from queue for endpoint iothub and 0 messages from message store.
<6> 2019-11-12 15:50:24.540 +00:00 [INF] - Started task to cleanup processed and stale messages for endpoint DG-0000000-191025-001/ActionExecutor/input1
<6> 2019-11-12 15:50:24.542 +00:00 [INF] - Cleaned up 0 messages from queue for endpoint DG-0000000-191025-001/ActionExecutor/input1 and 0 messages from message store.
<6> 2019-11-12 15:50:54.539 +00:00 [INF] - Started task to cleanup processed and stale messages for endpoint DG-0000000-191025-001/TelemetryRetriever/input1
<6> 2019-11-12 15:50:54.541 +00:00 [INF] - Cleaned up 0 messages from queue for endpoint DG-0000000-191025-001/TelemetryRetriever/input1 and 0 messages from message store.
<6> 2019-11-12 15:52:43.928 +00:00 [INF] - Starting refresh of device scope identities cache
<6> 2019-11-12 15:53:53.407 +00:00 [INF] - Reauthenticating connected clients
<6> 2019-11-12 15:58:53.406 +00:00 [INF] - Reauthenticating connected clients
<6> 2019-11-12 16:03:53.407 +00:00 [INF] - Reauthenticating connected clients
<6> 2019-11-12 16:08:53.406 +00:00 [INF] - Reauthenticating connected clients
<6> 2019-11-12 16:13:53.407 +00:00 [INF] - Reauthenticating connected clients
<6> 2019-11-12 16:18:53.407 +00:00 [INF] - Reauthenticating connected clients

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
ancaantochicommented, Nov 21, 2019

I think the change to the /etc/hosts file fixed your issue because the connection failed from the module to edgeHub. The change to daemon.json to add the DNS is to fix issues from edgeHub to the cloud. Because your module is running on the host network it needs to be able to resolve the GatewayHostname, you can check which value it has in the Env for IOTEDGE_GATEWAYHOSTNAME using:

docker inspect <moduleName>

To get daemon logs:

sudo journalctl -u iotedge -f

We will evaluate if iotedge check should validate /etc/hosts and /etc/hostname

0reactions
ancaantochicommented, Nov 23, 2019

I will close this issue, please reopen if needed.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshoot Azure IoT Edge common errors
The edgeAgent isn't able to connect to the IoT Hub and reports empty edge agent config and transient network error occurred.
Read more >
Azure IoT Edge ClientModule can't connect in host mode
After some more digging around, I found the solution myself. Module fails to restart because of transient network error.
Read more >
Common Error Messages on Catalyst 6500/6000 Series ...
This error message appears because of an erroneous response from the switch to Simple Network Management Protocol (SNMP) polling of the port adapters...
Read more >
1747-SDN DeviceNet Scanner Module
Module is detecting communication errors. Check DeviceNet connections and physical media integrity. Check system for failed slave devices or other possible ...
Read more >
Troubleshoot Dataflow errors
Custom container pipeline fails or is slow to start up workers. A known issue might cause Dataflow to mark workers as unresponsive and...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found