question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

High cpu utilization of Promise.await and app degradation over time

See original GitHub issue

Hello, this issue is a result of investigation of related problem in https://github.com/zio/zio-kafka/issues/238

For some reason during continuous recreation of Promises inner state of each new Promise(after old one was already completed) slowly growing and start to consumes too much cpu time as a results significantly affects performance of entire application.

The actual problem is in the next function and List of joiners, its starts from 0.1-2% of cpu time and goes up until you reset parent effect or app stops to respond(usually about up to 30%+ of cpu): image

image

How it looks in real application image

Reproducible scenario(takes at least 1-3 hours to see the effect), but the growing time it spends on inner joiners list is visible in cpu sampler from the beginning:

testM("stream should not go brrr") {

      def newJob(queue: Queue[Promise[Option[Throwable], Chunk[Int]]]) = ZStream {
        ZManaged.succeed {
          for {
            p <- Promise.make[Option[Throwable], Chunk[Int]]
            _ <- queue.offer(p)
            r <- p.await
          } yield r
        }
      }

      (for {
        jobQueue  <- Queue.unbounded[ZStream[Any, Throwable, Int]]
        feedQueue <- Queue.unbounded[Promise[Option[Throwable], Chunk[Int]]]
        
        //Actual stream of substreams
        stream = ZStream
          .fromQueue(jobQueue)
          .mapMParUnordered(50) { s =>
            //Just to affect performance faster
            s.mapMPar(10)(Task(_))
              .groupedWithin(1000, 1.millis)
              .buffer(2)
              .bufferSliding(1)
              .runDrain
          }
          .runDrain
        
        //Feed to fullfill promises
        feed = ZStream
          .fromQueue(feedQueue)
          .mapMPar(10) { x =>
            x.succeed(Chunk(1))
          }
          .runDrain
        
        fb1 <- stream.fork
        fb2 <- feed.fork
        
        _   <- Chunk.fromIterable(0.to(10)).mapM(_ => jobQueue.offer(newJob(feedQueue)))
        
        _   <- fb1.zip(fb2).join
      } yield assert(true)(equalTo(true))).provideCustomLayer(zio.clock.Clock.live)
}

TLDR: We have stream of substreams that generates and fullfill promises one by one, its starts to work very slow over time and use as much cpu as possible, after we recreate stream, performance become normal and start degrade from the beginning

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:2
  • Comments:11 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
iravidcommented, Jun 8, 2021

Oh sorry I thought we were in zio-kafka 😃 @jsfwa proposed a fix for this, but it’s a bit of a workaround and does not solve the actual underlying problem. @jdegoes was looking into this.

0reactions
aartigaocommented, Jun 8, 2021

@aartigao The root cause is #4395

Did you mean to link to another issue? 😄 Or recursive-trolling me? 😝

Read more comments on GitHub >

github_iconTop Results From Across the Web

Mysterious high cpu utilization over time with multiple topics ...
Hello, we were trying to migrate one of our services from alpakka to zio and discovered strange behavior So we have couple topics...
Read more >
IIS worker process: High CPU usage (Expert guide)
High worker process CPU usage often causes severe performance degradation because of the complex interplay between the async/parallel nature of modern web ...
Read more >
c# - Can async/await pattern cause performance penalties on ...
In other words, could the use of async/await lead to degraded performance when used in conjunction with relatively fast/resource-inexpensive ...
Read more >
Troubleshoot high CPU utilization on Amazon OpenSearch ...
A cluster that consistently performs at high CPU utilization can degrade cluster performance. When your cluster is overloaded, ...
Read more >
This is why your Node.js application is slow
The simple problem with it is that it is capable of slowing down your application greatly when not correctly used. Whenever a promise...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found