Command Router: endless "Enable Command Routing" loop
See original GitHub issueWe’ve encountered a scenario, where an “Enable Command Routing” invocation in the Command Router (see #2576) resulted in an endless loop, trying to invoke the command consumer registration for a large list of tenants again and again.
The scenario included a few subsequent “Enable Command Routing” requests with a lot of tenants (several hundred entries all in all, also caused by #2704 in connection with already deleted test tenants).
This resulted in many “get Tenant” invocations, which eventually led to a org.eclipse.hono.client.ServerErrorException: no credit available for sending request
exception.
The problem is that this caused the corresponding tenant to be again put into the queue of tenants for which “get Tenant” and “register command consumer” was to be invoked (as part of a retry mechanism).
As there is no delay in processing implemented here, the situation could never recover, leading to constant high CPU usage and errors in processing the single incoming “register command consumer” Command Router API requests for yet uncached tenants.
Also the Jaeger Agent was put under high load, leading to most of the tracing spans to get dropped.
(This made it difficult to analyze the original exception, as there is no log output here).
An indication of the problem in the Jaeger UI was that there was no “re-enable command routing for tenants” entry listed in the operations combo box, as this span never got finished.
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (11 by maintainers)
Top GitHub Comments
@sophokles73 No, the non-existing tenants result in
ClientErrorExceptions
, meaning there is no retry for that tenant.There are 2 problems FMPOV:
Yes.