question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

occasionally server sends "fragmented" acks

See original GitHub issue

This is quite hard to describe, and I’m struggling to reproduce it, but I’m fairly sure the source of the error can’t be my application code.

The setup: We have devices on a 6lowpan mesh in the field, connected to a gateway, which bridges the network via a VPN to our app servers. The devices send data via CoAP, and expect an acknowledgment with code 2.04 if the data were accepted. Only then does the packet get removed from the sending queue.

What I’m seeing: About 75% of the time, when using tshark to examine the CoAP traffic via sudo tshark -i tap0 -f "udp port 5683", I see:

3   4 109.044848 ec92::212:4b00:ea3:2035 -> bbbb::4001   CoAP 129 CON, TID:5483, PUT
4   5 109.091931   bbbb::4001 -> ec92::212:4b00:ea3:2035 CoAP 66 ACK, TID:5483, 2.04 Changed

Which is exactly as expected. About 25% of the time, I get this:

15  16 207.080173 ec92::212:4b00:ea3:2035 -> bbbb::4001   CoAP 129 CON, TID:5486, PUT
 17 207.142914   bbbb::4001 -> ec92::212:4b00:ea3:2035 CoAP 66 ACK, TID:5486, Empty Message
 18 207.152959   bbbb::4001 -> ec92::212:4b00:ea3:2035 CoAP 66 CON, TID:64217, 2.04 Changed
18  19 209.365676   bbbb::4001 -> ec92::212:4b00:ea3:2035 CoAP 66 CON, TID:64217, 2.04 Changed
19  20 213.797643   bbbb::4001 -> ec92::212:4b00:ea3:2035 CoAP 66 CON, TID:64217, 2.04 Changed
20  21 222.660820   bbbb::4001 -> ec92::212:4b00:ea3:2035 CoAP 66 CON, TID:64217, 2.04 Changed

Things to notice:

(1) the initial response is an “empty message”, but subsequently a load of 2.04s do get sent (2) my guess is that the reason the messages are repeated is a combination of link-layer retries and application-layer retries (if one examines the timing over a longer period this becomes quite clear). (3) the message id (TID, here, instead of MID for some reason) is different for the original “broken” ack, and the subsequent “fixed” acks (5486, 64217).

Unfortunately, in my local setup this has been impossible to replicate so far and I am unable to attach a debugger to my production server.

My guess is that somewhere in the node-coap library (or its dependencies) the response is being sent despite it not yet having been sent by my application code.

I’m node terribly aux fait with the node-coap codebase, so I’m not sure where this might be happening.

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:15

github_iconTop GitHub Comments

2reactions
GiedriusMcommented, Feb 26, 2017

@gfarrell the quick workaround for you is to increase the auto-ACK timeout

const server = coap.createServer({piggybackReplyMs: 1500, type: "udp6" });

The problem itself is a bit more complicated. In short, this happens when response is delayed more than piggybackReplyMs, but the reason for this are two bugs:

  1. Auto-ACK does not add Token option and so auto-ack frame on the client side may be ignored.
  2. After auto-ACK triggers, the response object is mangled/pseudo-deleted (see outgoing_message.js:[34-44]) and so when the late .end() call finally arrives, it sends a corrupted response frame. Also, as a semi-related bug:
  3. I noticed that if I disable auto-ACK and delay .end() call for several seconds, so that the client retry triggers, server gets a duplicate request event. IMO this should not happen, i.e. the server LRU cache should contain requests, rather than responses and if a duplicate arrives, send a response if and only if the original response end()'ed.

@mcollina, I have few questions for you. First, do you recollect why that response altering (mentioned in bug 2) was done? Personally, I would remove some of that code, but I’m afraid it may be there for a reason (name piggybackReplyMs suggests somewhat different functionality than autoAcknowledgement). Also, why does auto-ACK use raw send instead of trigering end() call. I’m thinking of adding an option to disable auto-ACK altogether, by using piggybackReplyMs parameter (None or <=0), so that the user would have an option to not send any reply, unless end() is called explicitly.

0reactions
stale[bot]commented, Jul 21, 2020

This issue has been automatically closed because of inactivity. Please open a new issue if still relevant and make sure to include all relevant details, logs and reproduction steps. Thank you for your contributions.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Fragmented Packet - an overview | ScienceDirect Topics
Path MTU discovery uses fragmentation to discover the largest size packet allowed across a network path. A large packet is sent with the...
Read more >
Different firewall events and their definition - Deep Security
Learn the different firewall events generated by Deep Security and know how to deal with them.
Read more >
Sending Large Blocks of Data - Gaffer On Games
To fix this, I implemented a new system for sending large blocks, one that handles packet loss by resends fragments until they are...
Read more >
FragAttacks: Fragmentation & Aggregation Attacks against Wi-Fi
Sending small frames causes high overhead: 6 header packet1. ACK. ACK header. This can be avoided by aggregating frames:.
Read more >
Support of Fragmentation of RADIUS Packets RFC 7499
In some cases, however, the authorization data sent by the RADIUS Server is ... data or may instead be an "ACK" to a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found