Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

100% CPU usage when working with a high number of workspaces

See original GitHub issue

We noticed a flaw in the Java SDK. When interacting with a large number of workspaces, the rate-limiting algorithm consume 100% CPU. After profiling our server we noticed that 96% of the CPU is spent on BaseMemoryMetricsDataStore$MaintenanceJob.run(). Particularly the methods updateCurrentQueueSize and updateNumberOfLastMinuteRequests. I believe this is because our app deals with a large number of workspaces (~8,000) and the maintenance job runs every 50ms.

As a temporary solution, we are considering turning off stats on MethodConfig.

Any other suggestion to mitigate/fix the issue?

The Slack SDK version

1.18.0

Java Runtime version

(Paste the output of java -version)

OS info

Linux

Steps to reproduce:

Create an app that unfurl links from ~8,000 workspaces.

Expected result:

Reasonable CPU usage.

Actual result:

100% CPU usage after running for 16h. The CPU usage progressively increases until the server restarts.

Issue Analytics

State:
Created 2 years ago
Comments:9 (6 by maintainers)

Top GitHub Comments

1reaction

sidneyamanicommented, Mar 2, 2022

Hey Kazuhiro,

Thanks so much for the fix. We’ve been running it in production for a week and the CPU is sitting around 10% now. The CPU still increases linearly but the slope is a lot gentler. We haven’t reenabled stats yet though. I’ll let you know if reenabling stats causes leads to any unexpected behaviour.

1reaction

seratchcommented, Feb 17, 2022

Hi @sidneyamani, I’ve merged the PR #934 and released a new version - v1.19.0 onto the Maven Central repository.

I hope the version works very well for your app. Also, if the default configuration isn’t great enough for you, you can adjust the behavior by:

Set rateLimiterBackgroundJobIntervalMillis to a longer value than 1,000 milliseconds (the default)
statsEnabled: false to either SlackConfig or MethodsConfig etc.

Refer to the release note for more details. https://github.com/slackapi/java-slack-sdk/releases/tag/v1.19.0

The first one won’t cause any big problem. The only downside would be the rate limiter in this SDK can be more conservative on the intervals between the same API calls. As for the second one, with this way, your app will be responsible for handling rate-limited error patterns.

Thanks again for reporting this issue. I hope the fix I applied this time helps.