question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

The Edge Hub reports unacknowledged messages without any error raised in the module

See original GitHub issue

I was trying to measure the maximum throughput I could expect using messages with routing.

I have two simple edge modules:

  • Producer - Listens for messages using a ModuleClient.SetInputMessageHandlerAsync(...) and responds with a message containing the current time in UTC in a JSON object.
  • Consumer - Sends a message to the producer module with the CorrelationId set to current time in UTC, and listens for a response message using a ModuleClient.SetInputMessageHandlerAsync(...), and measures the latency between two modules.

All code is available in my GitHub repo here: mill5james/IoTEdgeMethodVsMessage

And the images are published in Docker Hub at mill5james

When sending only messages between modules, I will repeatedly see unacknowledged messages in the Edge Hub without any error in the producer or consumer. These will be followed by exceptions from the MQTT stack. I have no clue as to the periodicity of these exceptions.

2019-06-16 21:06:13.114 +00:00 [WRN] - Error sending messages to module jamesp-iotedge2/producer
System.TimeoutException: Message completion response not received
   at Microsoft.Azure.Devices.Edge.Hub.Core.Device.DeviceMessageHandler.SendMessageAsync(IMessage message, String input) in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Core/device/DeviceMessageHandler.cs:line 363
   at Microsoft.Azure.Devices.Edge.Hub.Core.Routing.ModuleEndpoint.ModuleMessageProcessor.ProcessAsync(ICollection`1 routingMessages, IDeviceProxy dp, CancellationToken token) in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Core/routing/ModuleEndpoint.cs:line 164
2019-06-16 21:06:14.139 +00:00 [WRN] - Closing connection for device: jamesp-iotedge2/consumer, scope: ExceptionCaught, DotNetty.Codecs.DecoderException: [MQTT-2.3.1-1]
   at DotNetty.Codecs.Mqtt.MqttDecoder.DecodePacketIdVariableHeader(IByteBuffer buffer, PacketWithId packet, Int32& remainingLength)
   at DotNetty.Codecs.Mqtt.MqttDecoder.DecodePublishPacket(IByteBuffer buffer, PublishPacket packet, Int32& remainingLength)
   at DotNetty.Codecs.Mqtt.MqttDecoder.DecodePacketInternal(IByteBuffer buffer, Int32 packetSignature, Int32& remainingLength, IChannelHandlerContext context)
   at DotNetty.Codecs.Mqtt.MqttDecoder.TryDecodePacket(IByteBuffer buffer, IChannelHandlerContext context, Packet& packet)
   at DotNetty.Codecs.Mqtt.MqttDecoder.Decode(IChannelHandlerContext context, IByteBuffer input, List`1 output)
   at DotNetty.Codecs.ReplayingDecoder`1.CallDecode(IChannelHandlerContext context, IByteBuffer input, List`1 output)
   at DotNetty.Codecs.ByteToMessageDecoder.ChannelRead(IChannelHandlerContext context, Object message)
   at DotNetty.Transport.Channels.AbstractChannelHandlerContext.InvokeChannelRead(Object msg), 6cfe9aea
2019-06-16 21:06:14.140 +00:00 [INF] - Disposing MessagingServiceClient for device Id jamesp-iotedge2/consumer because of exception - DotNetty.Codecs.DecoderException: [MQTT-2.3.1-1]
   at DotNetty.Codecs.Mqtt.MqttDecoder.DecodePacketIdVariableHeader(IByteBuffer buffer, PacketWithId packet, Int32& remainingLength)
   at DotNetty.Codecs.Mqtt.MqttDecoder.DecodePublishPacket(IByteBuffer buffer, PublishPacket packet, Int32& remainingLength)
   at DotNetty.Codecs.Mqtt.MqttDecoder.DecodePacketInternal(IByteBuffer buffer, Int32 packetSignature, Int32& remainingLength, IChannelHandlerContext context)
   at DotNetty.Codecs.Mqtt.MqttDecoder.TryDecodePacket(IByteBuffer buffer, IChannelHandlerContext context, Packet& packet)
   at DotNetty.Codecs.Mqtt.MqttDecoder.Decode(IChannelHandlerContext context, IByteBuffer input, List`1 output)
   at DotNetty.Codecs.ReplayingDecoder`1.CallDecode(IChannelHandlerContext context, IByteBuffer input, List`1 output)
   at DotNetty.Codecs.ByteToMessageDecoder.ChannelRead(IChannelHandlerContext context, Object message)
   at DotNetty.Transport.Channels.AbstractChannelHandlerContext.InvokeChannelRead(Object msg)

Unfortunately, I am unsure if a message has been dropped.

Expected Behavior

Sending messages between modules should succeed when the edge hub is performing normally.

Current Behavior

The Edge Hub raises warnings to it’s logs. It is unclear if any messages were lost between modules.

Steps to Reproduce

  1. Download the deployment.json from my IoTEdgeMethodVsMessage GitHub repo
  2. Modify the deployment.json for the consumer to only enable messages by setting the EnableMethod to false and the EnableMessage to true
    "consumer": {
        "settings": {
            "image": "mill5james/consumer:latest",
            "createOptions": "{}"
        },
        "type": "docker",
        "env": {
            "EnableMethod": {
                "value": "false"
            },
            "EnableMessage": {
                "value": "true"
            }
        },
        "version": "1.0",
        "status": "running",
        "restartPolicy": "always"
    }
  1. Use the modified deployment.json to deploy to an IoT Edge device
  2. Observe the logs for the edgeHub module on the edge to see the exceptions being thrown in the module

Context (Environment)

Output of iotedge check

iotedge check
Configuration checks
--------------------
√ config.yaml is well-formed
√ config.yaml has well-formed connection string
√ container engine is installed and functional
√ config.yaml has correct hostname
√ config.yaml has correct URIs for daemon mgmt endpoint
√ latest security daemon
√ host time is close to real time
√ container time is close to host time
‼ DNS server
    Container engine is not configured with DNS server setting, which may impact connectivity to IoT Hub.
    Please see https://aka.ms/iotedge-prod-checklist-dns for best practices.
    You can ignore this warning if you are setting DNS server per module in the Edge deployment.
‼ production readiness: certificates
    Device is using self-signed, automatically generated certs.
    Please see https://aka.ms/iotedge-prod-checklist-certs for best practices.
√ production readiness: certificates expiry
√ production readiness: container engine
‼ production readiness: logs policy
    Container engine is not configured to rotate module logs which may cause it run out of disk space.
    Please see https://aka.ms/iotedge-prod-checklist-logs for best practices.
    You can ignore this warning if you are setting log policy per module in the Edge deployment.

Connectivity checks
-------------------
√ host can connect to and perform TLS handshake with IoT Hub AMQP port
√ host can connect to and perform TLS handshake with IoT Hub HTTPS port
√ host can connect to and perform TLS handshake with IoT Hub MQTT port
√ container on the default network can connect to IoT Hub AMQP port
√ container on the default network can connect to IoT Hub HTTPS port
√ container on the default network can connect to IoT Hub MQTT port
√ container on the IoT Edge module network can connect to IoT Hub AMQP port
√ container on the IoT Edge module network can connect to IoT Hub HTTPS port
√ container on the IoT Edge module network can connect to IoT Hub MQTT port
√ Edge Hub can bind to ports on host

Device (Host) Operating System

Ubuntu 18.04 LTS

Architecture

amd64

Container Operating System

Linux containers

Runtime Versions

iotedged

iotedge 1.0.7.1 (f7c51d92be8336bc6be042e1f1f2505ba01679f3)

Edge Agent

mcr.microsoft.com/azureiotedge-agent:1.0 Version - 1.0.7.1.22377503 (f7c51d92be8336bc6be042e1f1f2505ba01679f3)

Edge Hub

mcr.microsoft.com/azureiotedge-hub:1.0 Version - 1.0.7.1.22377503 (f7c51d92be8336bc6be042e1f1f2505ba01679f3)

Docker

Docker version
Client:
 Version:           3.0.5
 API version:       1.40
 Go version:        go1.12.1
 Git commit:        ba9934d4
 Built:             Thu Apr 18 22:01:41 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          3.0.5
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.1
  Git commit:       dbe4a30
  Built:            Thu Apr 18 22:07:58 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.2.5
  GitCommit:        bb71b10fd8f58240ca47fbb579b9d1028eea7c84
 runc:
  Version:          1.0.0-rc6+dev
  GitCommit:        2b18fe1d885ee5083ef9f0838fee39b62d653e30
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Logs

iotedged logs
<Paste here>
edge-agent logs
<Paste here>
edge-hub logs
<Paste here>

Additional Information

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:3
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
vipellercommented, Dec 2, 2019

@mill5james we found the reason of this error. There was an id counting error in the library - sending through 0x8000 messages within an hour lead to this problem. It will be fixed with the following releases.

1reaction
mill5jamescommented, Jun 17, 2019

@darobs If there is anything additional you need from me, just reach out. Glad to help.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshoot Azure IoT Edge common errors
Edge Agent module reports 'empty config file' and no modules start on the device. Symptoms. The device has trouble starting modules defined in ......
Read more >
Troubleshoot Dataflow errors
Look for messages that indicate that the DoFn code is stuck or otherwise encountering issues. If no messages are present, the issue might...
Read more >
Cisco UCS Faults and Error Messages Reference
This section contains faults raised as a result of issues related to a fabric extended module in the Cisco UCS instance. fltEquipmentFexIdentity.
Read more >
All fixes from Tokyo Early Access to Tokyo Patch 9
List of fixed problems for customers upgrading from Tokyo Early Access to Tokyo Patch 9. Download a list of fixed PRBs. On this...
Read more >
8. Errors and Exceptions — Python 3.11.4 documentation
First, the try clause (the statement(s) between the try and except keywords) is executed. · If no exception occurs, the except clause is...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found