question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Performance Issue / Ktor & Netty

See original GitHub issue

Ktor Version

1.2.0

Ktor Engine Used(client or server and name)

Netty Server

JVM Version, Operating System and Relevant Context

11.0.2, Debian (docker)

4 Cores -Xmx2G (parallelism = 4)

Default settings

Feedback

For most of the cases performance is pretty good, we get < 50ms including db save (all done asynchronously). Unfortunately from time to time, we are getting requests which stuck for > 1s, in worst cases even > 20 seconds. Server got the request, but processing route wasn’t invoked yet. In below there is part of NewRelic instrumentation stuck for such slow call, the problem is that Ktor is not yet well instrumented, and the only consecutive calls I see taking that time are below:

0 | 0.00% | Truncated: NettyUpstreamDispatcher |   | 0.000  s
-- | -- | -- | -- | --
0 | 0.00% | HttpServerExpectContinueHandler.channelRead() | Async | 0.000  s
0 | 0.00% | HttpServerExpectContinueHandler.channelRead() | Async | 0.000  s
0 | 0.00% | RequestBodyHandler.channelRead() | Async | 0.000  s
16.0 | 0.08% | NettyApplicationCallHandler.channelRead() | Async | 20.025  s
16.0 | 0.08% | NewRelicFeature.wrapIntoNewRelicTransaction() |   | 20.025  s
16.0 | 0.08% | com.revolut.eventstore.api.write.EventsControllerKt/saveEvent |   | 20.025  s
1.0 | 0.00% | Application code (in com.*.api.write.EventsControllerKt/saveEvent)

What I am trying to understand is - what could happen between:

0 | 0.00% | RequestBodyHandler.channelRead() | Async | 0.000  s
-- | -- | -- | -- | --
16.0 | 0.08% | NettyApplicationCallHandler.channelRead() | Async | 20.025  s

Why it took such long? It’s pretty hard to understand where may be any blocking/under sourced part - so I would really appreciate help with it

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:13 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
Hc747commented, Jun 12, 2019

Finally, after using shareWorkGroup = true I got

RequestBodyHandler.channelRead() | Async | 0.000  s (timesttamp)
ParametersCloseAsync context = nettyWorkerPool-3-16
// -- | -GAP- | -- 
NettyApplicationCallHandler.channelRead() | Async | 3.795  s (timestamp)
ParametersCloseAsync context = nettyWorkerPool-3-17

Which means it’s somewhere in selection process between above two, looking for some ideas how to deal with it 🤔

Potential improvement may be switch select to EPoll or KQueue - but it has to be added to Ktor

See #1124 - should be implemented soon!

0reactions
AsiaMacommented, Jul 1, 2021

Seems the issue is mostly related to sizes of groups, do you have any recommendation for those 3 settings? Basically as I do a lot of asynchronous code, the recommended settings of:

embeddedServer(AnyEngine, configure = {
    connectionGroupSize = parallelism / 2 + 1
    workerGroupSize = parallelism / 2 + 1
    callGroupSize = parallelism 
})

Doesn’t work, utilization of CPU is pretty low with it and I face above issue, only when I manually increase at least callGroupSize to 16 it starts to improve.

Summarising, now I have 4 Cores (parallelism), 2gb ram, using netty, and much more optimal results for

 connectionGroupSize = parallelism / 2 + 1 = 3 // default
 workerGroupSize = parallelism / 2 + 1 = 3  // default
 callGroupSize = 16  // manually set

I would need to analyse your code to understand what is best ratio between those 3 - again would repeat that your default settings seems not ideal at least for Netty

Your code may have blocked code.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Performance Issue / Ktor & Netty - JetBrains YouTrack
Doesn't work, utilization of CPU is pretty low with it and I face above issue, only when I manually increase at least callGroupSize...
Read more >
Performance for my Ktor web service (Profiling?) : r/Kotlin
I've been working on a (now) Ktor web service for the past few years ... My main issue is: I'm running into some...
Read more >
Testing | Ktor
The code below demonstrates how to test the most simple Ktor application that accepts GET requests made to the / path and responds...
Read more >
Comparing the Performance of Frameworks for JVM Backend ...
We will test it with a Netty and with a CIO engine. CIO stands for “coroutine based I/O” and is a web engine...
Read more >
What I Learnt from Benchmarking Http4k, Ktor (Kotlin) and ...
One nice side-effect of this benchmarking was that I have discovered a cause of the Netty backend performance issue, and @daviddenton was ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found