question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Subscriber maxes out CPU after exactly 60 minutes on GKE

See original GitHub issue

Environment details

  • OS: GKE
  • Node.js version: 12.14.1
  • npm version: –
  • @google-cloud/pubsub version: pubsub@1.5.0 / grpc-js@0.6.16

Steps to reproduce

  1. Start a pod that subscribes to an idle subscription
  2. Wait 60 minutes
  3. Observe node process eat up all available CPU
Screenshot 2020-02-19 12 54 29

I have a pod that subscribes to a subscription which is 100% idle (i.e. no messages published on that topic) during the observation period.

Exactly 60 minutes after the pod started it eats up all available CPU until the pod is deleted manually to force a restart.

This behavior does not appear when using the C++ gGRPC bindings as suggested in https://github.com/googleapis/nodejs-pubsub/issues/850#issuecomment-573900907

I have also enabled debug output through

GRPC_TRACE=all
GRPC_VERBOSITY=DEBUG

I’ve put the full debug output (as fetched from Google Cloud Logging) of the entire lifetime of the pod in this gist.

There is no further output after the last line even though the pod kept running for a while as can be seen form the graph above: the bump in CPU usage is the time where the pod eats up all available CPU and doesn’t seem to log anything. (Times in the logs correspond to the times in the graph above with 1h timezone difference, so 10:51 in the logs = 11:51 in the graph).

I can reproduce this problem with any of my subscriber pods.

I believe it is very likely that this problem also falls into the category of #868.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:24 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
murgatroid99commented, Feb 24, 2020

I have now published @grpc/grpc-js version 0.6.18. I’m pretty sure I’ve fixed it this time, and if not I added some more relevant logging. Can you try that out?

1reaction
feywindcommented, Feb 25, 2020

@ctavan I will take a look at updating the versions in nodejs-pubsub. Thank you for all the help in testing this stuff!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Tomcat has high (100%) CPU usage after about 30 to 60 ...
Tomcat has high (100%) CPU usage after about 30 to 60 minutes.
Read more >
GKE: How to handle deployments with CPU intensive ...
A default Horizontal Pod Autoscaler was created for each deployment, with target CPU 80% and min 1 / max 5 replicas. During normal...
Read more >
Setting the right requests and limits in Kubernetes
Imagine having three containers that have a CPU request set to 60 millicores, 20 millicores and 20 millicores. The total request is only...
Read more >
Choose a minimum CPU platform
You can specify a minimum CPU platform for a new node pool in an existing cluster using the Google Cloud CLI, the Google...
Read more >
Configure Minimum and Maximum CPU Constraints for a ...
kubectl get limitrange cpu-min-max-demo-lr --output=yaml ... The container specifies a CPU request of 100 millicpu and a CPU limit of 800 ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found