question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] CoAP observation drops

See original GitHub issue

Bug CoAP observation relationship is “lost” ~1min after being established. While trying to Observe coap://host/api/v1/$ACCESS_TOKEN/rpc

Server Confirmed on

  • demo.thingsboard.io and
  • ThingsBoard PE 3.3.0 running on Ubuntu 20.04.3 (Docker monolith)

Your Device

  • Connectivity
    • CoAP Reproducible with “coap-client” from libcoap 4.3.0 running on linux (Ubuntu 20.04.3)

To Reproduce Steps to reproduce the behavior:

  1. Using the sample widget “send rpc” on a dashboard. Attempt to send an RPC command to coap-client, which is subscribed by launching the process with the following syntax: ./coap-client -m get coap://demo.thingsboard.io/api/v1/$ACCESS_TOKEN/rpc -s 720 -B 720
  2. Click on “send rpc”. Which should result in the following output being printed (stdout) by the coap-client process: {"id":1,"method":"rpcCommand","params":{}}
  3. Wait ~1min or longer and click “send rpc”. This command will never reach the coap-client.

Expected behavior CoAP observation should not be dropped without notifying the observer.

Relevant logs 2021-09-21 13:37:38,274 [DefaultTransportService-22-6] INFO o.e.californium.core.CoapResource - successfully established observe relation between 172.19.0.1:36342#BEEFFEED and resource /api/v1 (Exchange[R1132], size 33)

after failing to send RPC to device

2021-09-21 13:45:06,488 [CoapServer(main)#2] INFO o.e.c.c.network.stack.ObserveLayer - notification for token [Token=BEEFFEED] timed out. Canceling all relations with source [/172.19.0.1:36342] 2021-09-21 13:45:06,489 [CoapServer(main)#2] INFO o.e.californium.core.CoapResource - remove observe relation between 172.19.0.1:36342#BEEFFEED and resource /api/v1 (Exchange[R1132, complete], size 32) 2021-09-21 13:45:06,489 [CoapServer(main)#2] ERROR o.e.c.c.n.stack.ReliabilityLayer - Exception for Exchange[R1132, complete] in MessageObserver: null java.lang.NullPointerException: null at org.thingsboard.server.transport.coap.client.DefaultCoapClientContext.cancelRpcSubscription(DefaultCoapClientContext.java:741) at org.thingsboard.server.transport.coap.client.DefaultCoapClientContext.deregisterObserveRelation(DefaultCoapClientContext.java:176) at org.thingsboard.server.transport.coap.CoapTransportResource$CoapResourceObserver.removedObserveRelation(CoapTransportResource.java:504) at org.eclipse.californium.core.CoapResource.removeObserveRelation(CoapResource.java:778) at org.eclipse.californium.core.observe.ObserveRelation.cancel(ObserveRelation.java:151) at org.eclipse.californium.core.observe.ObservingEndpoint.cancelAll(ObservingEndpoint.java:74) at org.eclipse.californium.core.observe.ObserveRelation.cancelAll(ObserveRelation.java:162) at org.eclipse.californium.core.network.stack.ObserveLayer$NotificationController.onTimeout(ObserveLayer.java:233) at org.eclipse.californium.core.coap.Message.setTimedOut(Message.java:954) at org.eclipse.californium.core.network.Exchange.setTimedOut(Exchange.java:707) at org.eclipse.californium.core.network.stack.ReliabilityLayer$RetransmissionTask.retry(ReliabilityLayer.java:524) at org.eclipse.californium.core.network.stack.ReliabilityLayer$RetransmissionTask.access$200(ReliabilityLayer.java:430) at org.eclipse.californium.core.network.stack.ReliabilityLayer$RetransmissionTask$1.run(ReliabilityLayer.java:467) at org.eclipse.californium.elements.util.SerialExecutor$1.run(SerialExecutor.java:289) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834)

Additional context Somewhat noteworthy is the fact that monitoring the packets in/out we could not observe any outgoing packets for the failed RPC to device. Additionally, this interaction is on IPv6, and yet the log shows the observation being mapped to IPv4.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:13 (4 by maintainers)

github_iconTop GitHub Comments

3reactions
ashvaykacommented, Oct 27, 2021

Hi @jairohg , I will provide more comments about this issue tomorrow morning. This is indeed related to NAT but not only to NAT. It is also about the routing tables on many load balancers. It is a long story but we have a solution. Stay tuned for updates.

1reaction
ashvaykacommented, Nov 17, 2021

Hi @WillNilges , my 2 cents about our “coap.thingsboard.cloud” setup: at the moment the LB is installed on AWS Ubuntu VMs with elastic IPs. It forwards the traffic to LwM2M pods using Node Port. The LB is “remembering” the routing table, which consists of: A) Source IP and port of the device. B) Dest IP and port of the Node; The LB is configured to remember the sessions for 1 hour. So, when the Node has an update, we make sure we push it from the correct LB IP and Port, and not from the AWS NAT Gateway.

Before the LB, we were still publishing the update from the node, but it was sent from the wrong IP (not from the LB IP, which received the packet, but from the AWS NAT Gateway). The client was ignoring the update since the IP of the originator of the CoAP/UDP packet was different.

Read more comments on GitHub >

github_iconTop Results From Across the Web

RFC 7641 - Observing Resources in the Constrained ...
This document specifies a simple protocol extension for CoAP that enables CoAP clients to "observe" resources, i.e., to retrieve a representation of a ......
Read more >
coap_recovery(3) - libCoAP
coap_observe(3) - work with CoAP observe ... it is possible to define the retry counts, repeat rate etc. for error recovery.
Read more >
Congestion Control in CoAP Observe Group Communication
To illustrate this, as network congestion starts, the RevRED can solve this problem by dropping arriving packets in the system before the queue ......
Read more >
Enhancements and Challenges in CoAP—A Survey - MDPI
Based on UDP, CoAP is a lightweight and efficient protocol compared to other IoT ... as resource observation, resource discovery, congestion control, etc....
Read more >
aiocoap.protocol module
This will not generate an error by itself. on_cancel (callback)¶. class aiocoap.protocol. ServerObservation ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found