Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

occasionally server sends "fragmented" acks

See original GitHub issue

This is quite hard to describe, and I’m struggling to reproduce it, but I’m fairly sure the source of the error can’t be my application code.

The setup: We have devices on a 6lowpan mesh in the field, connected to a gateway, which bridges the network via a VPN to our app servers. The devices send data via CoAP, and expect an acknowledgment with code 2.04 if the data were accepted. Only then does the packet get removed from the sending queue.

What I’m seeing: About 75% of the time, when using tshark to examine the CoAP traffic via sudo tshark -i tap0 -f "udp port 5683", I see:

3   4 109.044848 ec92::212:4b00:ea3:2035 -> bbbb::4001   CoAP 129 CON, TID:5483, PUT
4   5 109.091931   bbbb::4001 -> ec92::212:4b00:ea3:2035 CoAP 66 ACK, TID:5483, 2.04 Changed

Which is exactly as expected. About 25% of the time, I get this:

15  16 207.080173 ec92::212:4b00:ea3:2035 -> bbbb::4001   CoAP 129 CON, TID:5486, PUT
 17 207.142914   bbbb::4001 -> ec92::212:4b00:ea3:2035 CoAP 66 ACK, TID:5486, Empty Message
 18 207.152959   bbbb::4001 -> ec92::212:4b00:ea3:2035 CoAP 66 CON, TID:64217, 2.04 Changed
18  19 209.365676   bbbb::4001 -> ec92::212:4b00:ea3:2035 CoAP 66 CON, TID:64217, 2.04 Changed
19  20 213.797643   bbbb::4001 -> ec92::212:4b00:ea3:2035 CoAP 66 CON, TID:64217, 2.04 Changed
20  21 222.660820   bbbb::4001 -> ec92::212:4b00:ea3:2035 CoAP 66 CON, TID:64217, 2.04 Changed

Things to notice:

(1) the initial response is an “empty message”, but subsequently a load of 2.04s do get sent (2) my guess is that the reason the messages are repeated is a combination of link-layer retries and application-layer retries (if one examines the timing over a longer period this becomes quite clear). (3) the message id (TID, here, instead of MID for some reason) is different for the original “broken” ack, and the subsequent “fixed” acks (5486, 64217).

Unfortunately, in my local setup this has been impossible to replicate so far and I am unable to attach a debugger to my production server.

My guess is that somewhere in the node-coap library (or its dependencies) the response is being sent despite it not yet having been sent by my application code.

I’m node terribly aux fait with the node-coap codebase, so I’m not sure where this might be happening.

Issue Analytics

State:
Created 7 years ago
Comments:15

Top GitHub Comments

2reactions

GiedriusMcommented, Feb 26, 2017

@gfarrell the quick workaround for you is to increase the auto-ACK timeout

const server = coap.createServer({piggybackReplyMs: 1500, type: "udp6" });

The problem itself is a bit more complicated. In short, this happens when response is delayed more than piggybackReplyMs, but the reason for this are two bugs:

Auto-ACK does not add Token option and so auto-ack frame on the client side may be ignored.
After auto-ACK triggers, the response object is mangled/pseudo-deleted (see outgoing_message.js:[34-44]) and so when the late .end() call finally arrives, it sends a corrupted response frame. Also, as a semi-related bug:
I noticed that if I disable auto-ACK and delay .end() call for several seconds, so that the client retry triggers, server gets a duplicate request event. IMO this should not happen, i.e. the server LRU cache should contain requests, rather than responses and if a duplicate arrives, send a response if and only if the original response end()'ed.

@mcollina, I have few questions for you. First, do you recollect why that response altering (mentioned in bug 2) was done? Personally, I would remove some of that code, but I’m afraid it may be there for a reason (name piggybackReplyMs suggests somewhat different functionality than autoAcknowledgement). Also, why does auto-ACK use raw send instead of trigering end() call. I’m thinking of adding an option to disable auto-ACK altogether, by using piggybackReplyMs parameter (None or <=0), so that the user would have an option to not send any reply, unless end() is called explicitly.

0reactions

stale[bot]commented, Jul 21, 2020

This issue has been automatically closed because of inactivity. Please open a new issue if still relevant and make sure to include all relevant details, logs and reproduction steps. Thank you for your contributions.