Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] CoAP observation drops

See original GitHub issue

Bug CoAP observation relationship is “lost” ~1min after being established. While trying to Observe coap://host/api/v1/$ACCESS_TOKEN/rpc

Server Confirmed on

demo.thingsboard.io and
ThingsBoard PE 3.3.0 running on Ubuntu 20.04.3 (Docker monolith)

Your Device

Connectivity
- CoAP Reproducible with “coap-client” from libcoap 4.3.0 running on linux (Ubuntu 20.04.3)

To Reproduce Steps to reproduce the behavior:

Using the sample widget “send rpc” on a dashboard. Attempt to send an RPC command to coap-client, which is subscribed by launching the process with the following syntax: ./coap-client -m get coap://demo.thingsboard.io/api/v1/$ACCESS_TOKEN/rpc -s 720 -B 720
Click on “send rpc”. Which should result in the following output being printed (stdout) by the coap-client process: {"id":1,"method":"rpcCommand","params":{}}
Wait ~1min or longer and click “send rpc”. This command will never reach the coap-client.

Expected behavior CoAP observation should not be dropped without notifying the observer.

Relevant logs 2021-09-21 13:37:38,274 [DefaultTransportService-22-6] INFO o.e.californium.core.CoapResource - successfully established observe relation between 172.19.0.1:36342#BEEFFEED and resource /api/v1 (Exchange[R1132], size 33)

after failing to send RPC to device

2021-09-21 13:45:06,488 [CoapServer(main)#2] INFO o.e.c.c.network.stack.ObserveLayer - notification for token [Token=BEEFFEED] timed out. Canceling all relations with source [/172.19.0.1:36342] 2021-09-21 13:45:06,489 [CoapServer(main)#2] INFO o.e.californium.core.CoapResource - remove observe relation between 172.19.0.1:36342#BEEFFEED and resource /api/v1 (Exchange[R1132, complete], size 32) 2021-09-21 13:45:06,489 [CoapServer(main)#2] ERROR o.e.c.c.n.stack.ReliabilityLayer - Exception for Exchange[R1132, complete] in MessageObserver: null java.lang.NullPointerException: null at org.thingsboard.server.transport.coap.client.DefaultCoapClientContext.cancelRpcSubscription(DefaultCoapClientContext.java:741) at org.thingsboard.server.transport.coap.client.DefaultCoapClientContext.deregisterObserveRelation(DefaultCoapClientContext.java:176) at org.thingsboard.server.transport.coap.CoapTransportResource$CoapResourceObserver.removedObserveRelation(CoapTransportResource.java:504) at org.eclipse.californium.core.CoapResource.removeObserveRelation(CoapResource.java:778) at org.eclipse.californium.core.observe.ObserveRelation.cancel(ObserveRelation.java:151) at org.eclipse.californium.core.observe.ObservingEndpoint.cancelAll(ObservingEndpoint.java:74) at org.eclipse.californium.core.observe.ObserveRelation.cancelAll(ObserveRelation.java:162) at org.eclipse.californium.core.network.stack.ObserveLayer$NotificationController.onTimeout(ObserveLayer.java:233) at org.eclipse.californium.core.coap.Message.setTimedOut(Message.java:954) at org.eclipse.californium.core.network.Exchange.setTimedOut(Exchange.java:707) at org.eclipse.californium.core.network.stack.ReliabilityLayer$RetransmissionTask.retry(ReliabilityLayer.java:524) at org.eclipse.californium.core.network.stack.ReliabilityLayer$RetransmissionTask.access$200(ReliabilityLayer.java:430) at org.eclipse.californium.core.network.stack.ReliabilityLayer$RetransmissionTask$1.run(ReliabilityLayer.java:467) at org.eclipse.californium.elements.util.SerialExecutor$1.run(SerialExecutor.java:289) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834)

Additional context Somewhat noteworthy is the fact that monitoring the packets in/out we could not observe any outgoing packets for the failed RPC to device. Additionally, this interaction is on IPv6, and yet the log shows the observation being mapped to IPv4.

Issue Analytics

State:
Created 2 years ago
Comments:13 (4 by maintainers)

Top GitHub Comments

3reactions

ashvaykacommented, Oct 27, 2021

Hi @jairohg , I will provide more comments about this issue tomorrow morning. This is indeed related to NAT but not only to NAT. It is also about the routing tables on many load balancers. It is a long story but we have a solution. Stay tuned for updates.

1reaction

ashvaykacommented, Nov 17, 2021

Hi @WillNilges , my 2 cents about our “coap.thingsboard.cloud” setup: at the moment the LB is installed on AWS Ubuntu VMs with elastic IPs. It forwards the traffic to LwM2M pods using Node Port. The LB is “remembering” the routing table, which consists of: A) Source IP and port of the device. B) Dest IP and port of the Node; The LB is configured to remember the sessions for 1 hour. So, when the Node has an update, we make sure we push it from the correct LB IP and Port, and not from the AWS NAT Gateway.

Before the LB, we were still publishing the update from the node, but it was sent from the wrong IP (not from the LB IP, which received the packet, but from the AWS NAT Gateway). The client was ignoring the update since the IP of the originator of the CoAP/UDP packet was different.

Top Results From Across the Web

RFC 7641 - Observing Resources in the Constrained ...

This document specifies a simple protocol extension for CoAP that enables CoAP clients to "observe" resources, i.e., to retrieve a representation of a ......

coap_recovery(3) - libCoAP

coap_observe(3) - work with CoAP observe ... it is possible to define the retry counts, repeat rate etc. for error recovery.

Congestion Control in CoAP Observe Group Communication

To illustrate this, as network congestion starts, the RevRED can solve this problem by dropping arriving packets in the system before the queue ......

Enhancements and Challenges in CoAP—A Survey - MDPI

Based on UDP, CoAP is a lightweight and efficient protocol compared to other IoT ... as resource observation, resource discovery, congestion control, etc....

aiocoap.protocol module

This will not generate an error by itself. on_cancel (callback)¶. class aiocoap.protocol. ServerObservation ...