Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Auth & device registry keep getting OOMKilled with default limits

See original GitHub issue

I’m running Hono 0.7 on Kubernetes using the deployment script with next to no changes to the resources, i.e. I haven’t changed anything as described in https://www.eclipse.org/hono/deployment/resource-limitation/.

kubectl describe node extract:

Allocatable:
 cpu:                2
 ephemeral-storage:  63941352Ki
 hugepages-2Mi:      0
 memory:             7661808Ki
 pods:               110
System Info:
 Machine ID:                 996ebefb2cc44da1ae864d3c078ca1eb
 System UUID:                EC265B0A-9CA8-1B71-3E38-9FB5BC8E14FF
 Boot ID:                    9f7d86b5-42ca-44cf-a8aa-1f6c9b9494d9
 Kernel Version:             4.4.0-1066-aws
 OS Image:                   Ubuntu 16.04.2 LTS
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  docker://17.3.2
 Kubelet Version:            v1.11.2
 Kube-Proxy Version:         v1.11.2
Non-terminated Pods:         (20 in total)
  Namespace                  Name                                               CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ---------                  ----                                               ------------  ----------  ---------------  -------------
...
  hono                       hono-service-auth-7949d57744-4ch5t                 0 (0%)        0 (0%)      196Mi (2%)       196Mi (2%)
  hono                       hono-service-device-registry-85d87b66dd-m8nsl      0 (0%)        0 (0%)      256Mi (3%)       256Mi (3%)
...
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource  Requests          Limits
  --------  --------          ------
  cpu       778m (38%)        498m (24%)
  memory    3582622656 (45%)  3768528Ki (49%)

However I’m noticing that two services, the auth service and the device registry, are regularly OOMKilled (every few hours):

$ kubectl -n hono get pod
NAME                                            READY     STATUS    RESTARTS   AGE
grafana-5645865df8-4prlg                        1/1       Running   5          14d
hono-adapter-http-vertx-7d78bc5f4d-pwmgd        1/1       Running   6          14d
hono-adapter-mqtt-vertx-799bd5858c-stw6b        1/1       Running   3          5d
hono-artemis-797c695777-gljgv                   1/1       Running   5          14d
hono-dispatch-router-5fd7756dfb-bmnzf           1/1       Running   5          14d
hono-service-auth-7949d57744-4ch5t              1/1       Running   172        11d
hono-service-device-registry-85d87b66dd-m8nsl   1/1       Running   79         14d
influxdb-784f8f677c-xd6k5                       1/1       Running   5          14d

Here’s the relevant describe pod output for the auth service (the device registry is similar):

$ kubectl -n hono describe pod -l app=hono-service-auth
...
    Last State:     Terminated
      Reason:       OOMKilled
...
    Restart Count:  172
    Limits:
      memory:  196Mi
    Requests:
      memory:   196Mi

To confirm -Xmx150m is set:

$ kubectl -n hono describe deployment -l app=hono-service-auth
...
    Environment:
      SPRING_CONFIG_LOCATION:  file:///etc/hono/
      SPRING_PROFILES_ACTIVE:  authentication-impl,dev
      LOGGING_CONFIG:          classpath:logback-spring.xml
      _JAVA_OPTIONS:           -Xmx150m
      KUBERNETES_NAMESPACE:     (v1:metadata.namespace)

I can increase the limits but wanted to know whether it’s just me or are the defaults wrong? What could be the cause of these memory issues? Thanks!

Issue Analytics

State:
Created 5 years ago
Comments:11 (6 by maintainers)

Top GitHub Comments

1reaction

sysexcontrolcommented, Sep 11, 2018

I completely see your point, sounds reasonable.

As a quick information : I configured the -Xmx80m for the device registry locally (not running in a container) and monitored the heap usage and the log while firing telemetry messages to the HTTP adapter. No problems, a clean sawtooth behaviour, no OOM exceptions, no malfunctions.

So go ahead and tweak Xmx down and let’s see if it works in your environment as well. Then we can lower the Xmx settings in the default descriptors of Hono, too (but being careful, setting them too low may cause too many problems, and it is hard to find out what is too low). To me it looks like your original problem came from the small amount of memory that was left for the pod after the JVM took it’s full memory assignment. And this would be exactly addressed by what you proposed: leave the kubernetes limits as they were and lower the Xmx.

If you tried it out, please post your results here, thanks!

0reactions

sophokles73commented, Mar 25, 2019

@ghys is this still an issue? If not, can you close this issue?