occasionally server sends "fragmented" acks
See original GitHub issueIssue Description
This is quite hard to describe, and I’m struggling to reproduce it, but I’m fairly sure the source of the error can’t be my application code.
The setup:
We have devices on a 6lowpan mesh in the field, connected to a gateway, which bridges the network via a VPN to our app servers. The devices send data via CoAP, and expect an acknowledgment with code 2.04
if the data were accepted. Only then does the packet get removed from the sending queue.
What I’m seeing:
About 75% of the time, when using tshark
to examine the CoAP traffic via sudo tshark -i tap0 -f "udp port 5683"
, I see:
3 4 109.044848 ec92::212:4b00:ea3:2035 -> bbbb::4001 CoAP 129 CON, TID:5483, PUT
4 5 109.091931 bbbb::4001 -> ec92::212:4b00:ea3:2035 CoAP 66 ACK, TID:5483, 2.04 Changed
Which is exactly as expected. About 25% of the time, I get this:
15 16 207.080173 ec92::212:4b00:ea3:2035 -> bbbb::4001 CoAP 129 CON, TID:5486, PUT
17 207.142914 bbbb::4001 -> ec92::212:4b00:ea3:2035 CoAP 66 ACK, TID:5486, Empty Message
18 207.152959 bbbb::4001 -> ec92::212:4b00:ea3:2035 CoAP 66 CON, TID:64217, 2.04 Changed
18 19 209.365676 bbbb::4001 -> ec92::212:4b00:ea3:2035 CoAP 66 CON, TID:64217, 2.04 Changed
19 20 213.797643 bbbb::4001 -> ec92::212:4b00:ea3:2035 CoAP 66 CON, TID:64217, 2.04 Changed
20 21 222.660820 bbbb::4001 -> ec92::212:4b00:ea3:2035 CoAP 66 CON, TID:64217, 2.04 Changed
Things to notice:
(1) the initial response is an “empty message”, but subsequently a load of 2.04
s do get sent
(2) my guess is that the reason the messages are repeated is a combination of link-layer retries and application-layer retries (if one examines the timing over a longer period this becomes quite clear).
(3) the message id (TID, here, instead of MID for some reason) is different for the original “broken” ack, and the subsequent “fixed” acks (5486
, 64217
).
Unfortunately, in my local setup this has been impossible to replicate so far and I am unable to attach a debugger to my production server.
My guess is that somewhere in the node-coap
library (or its dependencies) the response is being sent despite it not yet having been sent by my application code.
I’m node terribly aux fait with the node-coap
codebase, so I’m not sure where this might be happening.
Issue Analytics
- State:
- Created 6 years ago
- Comments:15
@gfarrell the quick workaround for you is to increase the auto-ACK timeout
The problem itself is a bit more complicated. In short, this happens when response is delayed more than piggybackReplyMs, but the reason for this are two bugs:
@mcollina, I have few questions for you. First, do you recollect why that response altering (mentioned in bug 2) was done? Personally, I would remove some of that code, but I’m afraid it may be there for a reason (name piggybackReplyMs suggests somewhat different functionality than autoAcknowledgement). Also, why does auto-ACK use raw send instead of trigering end() call. I’m thinking of adding an option to disable auto-ACK altogether, by using piggybackReplyMs parameter (None or <=0), so that the user would have an option to not send any reply, unless end() is called explicitly.
This issue has been automatically closed because of inactivity. Please open a new issue if still relevant and make sure to include all relevant details, logs and reproduction steps. Thank you for your contributions.