question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

WaitQueue is filled up but it's never emptied

See original GitHub issue

We are using 0.20.3, and we are facing some problems with the Blaze http client. It has happened a couple of times where the client starts returning “Wait queue is full”, and it stays in that state forever. In order to make it work again, we simply restart the service. I know that this issue has surfaced and been solved in previous versions of http4s (e.g. https://github.com/http4s/http4s/issues/2193). I don’t know if this is the exact same problem but it might be related.

When trying to reproduce the bug, we encountered another problem that might or might not be related where the program just halts. This has been tested both on 0.20.3 and 0.20.12. The code used to try to reproduce the issue is:

import java.util.concurrent.atomic.AtomicInteger

import cats.effect.{IO, Timer}
import fs2.Stream

import org.http4s.Request
import org.http4s._
import org.http4s.client.blaze.BlazeClientBuilder

import scala.concurrent.duration._
import scala.concurrent.ExecutionContext

object Main {
  
  def main(args: Array[String]): Unit = {
    val int                       = new AtomicInteger(0)
    implicit val CS               = IO.contextShift(ExecutionContext.global)
    implicit val timer: Timer[IO] = IO.timer(ExecutionContext.global)
    val timeout                   = 1 seconds

    val program = for {
      client <- BlazeClientBuilder[IO](scala.concurrent.ExecutionContext.global)
                 .withRequestTimeout(timeout)
                 .withResponseHeaderTimeout(timeout)
                 .allocated
      c      = client._1
      status = c.status(Request[IO](uri = uri"""http://httpbin.org/status/500""")).attempt
      _ <- Stream(Stream.eval(status)).repeat
            .covary[IO]
            .parJoin(100)
            .take(1000)
            .observe(x => x.flatMap(y => Stream.eval(IO(println(">>> " + int.incrementAndGet + " " + y)))))
            .compile
            .drain
      s <- c.status(Request[IO](uri = uri"""http://httpbin.org/status/500""")).attempt
      _ <- IO(println("STATUS = " + s.right.get))
    } yield ()

    program.unsafeRunSync()
  }
}

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:3
  • Comments:14 (14 by maintainers)

github_iconTop GitHub Comments

2reactions
RafalSumislawskicommented, Jan 19, 2020

Hello folks It looks like the PoolManager is leaking active connections. Once it leaks 10 connections (that’s the default for maxConnectionsPerRequestKey) the application hangs.

When a request is completed, the PoolManager is supposed to reuse the connection to execute next request. If the next request in the queue is already expired, the expired request is completed with a failure as expected, but the connection gets lost (is neither reused nor closed, and the counter of active connections isn’t decreased).

1reaction
desbocommented, Jan 15, 2020

Here’s a reworked version of the original code which reliably hangs on my machine:

object BlazeWaitQueueTestGitHub extends IOApp {
  import java.util.concurrent.atomic.AtomicInteger

  import cats.effect.IO
  import org.http4s.client.blaze.BlazeClientBuilder
  import org.http4s.{Request, _}
  import scala.concurrent.duration._
  import fs2._

  implicit val CS = IO.contextShift(global)

  val count = new AtomicInteger(0)

  override def run(args: List[String]): IO[ExitCode] = {
    val timeout = 1.milli
    val numRequests = 9

    BlazeClientBuilder[IO](scala.concurrent.ExecutionContext.global)
      .withMaxTotalConnections(1)
      .withRequestTimeout(timeout)
      .resource
      .use { client =>
        val getStatus = client.status(Request[IO](uri = Uri.unsafeFromString("""http://httpbin.org/status/500""")))

        Stream(Stream.eval(getStatus).attempt)
          .covary[IO]
          .repeat
          .parJoin(2)
          .take(numRequests)
          .evalTap {
            case Left(WaitQueueTimeoutException) =>
              IO(println(s"got WaitQueueTimeoutException after ${count.getAndIncrement} requests"))
            case _ =>
              IO(count.getAndIncrement)
          }
          .compile
          .drain
          .map(_ => println("all requests completed successfully"))
          .as(ExitCode.Success)
      }
  }
}

Some observations after playing with this:

  • the program runs successfully after decreasing numConnections or the parJoin parameter, or increasing the client’s max total connections
  • during runs that hang, the message when handling a WaitQueueTimeoutException isn’t always printed, but there’s always a DEBUG org.http4s.client.PoolManager - Request expired log message (and this never appears during successful runs)
  • removing the attempt from the initial client call causes the program to fail with an java.util.concurrent.TimeoutException: Request timeout after 1 ms error, rather than hanging

Considering the last point, I’m not sure this is something that should (or can) be fixed in http4s and may instead be an expected result of using attempt and parJoin together. The parJoin comment says:

Once [the maxOpen] limit is reached, evaluation of the outer stream is paused until one or more inner streams finish evaluating

…so, it could be that this example app creates a deadlock where the outer stream is paused and none of the inner streams can finish evaluation – although I’m still trying to work out exactly how that could happen.

edit: possibly a duplicate of https://github.com/http4s/http4s/issues/2068

Read more comments on GitHub >

github_iconTop Results From Across the Web

MongoDB - The wait queue for acquiring a connection to ...
I just increase the minPoolSize=3000 to minPoolSize=5000 in mongo db connection. This will solve my issue. Also please note if you increase ...
Read more >
Blocking I/O - Linux Device Drivers, Second Edition [Book]
If a process calls write and there is no space in the buffer, the process must block, and it must be on a...
Read more >
queue — A synchronized queue class — Python 3.11.1 ...
The queue module implements multi-producer, multi-consumer queues. It is especially useful in threaded programming when information must be exchanged safely ...
Read more >
CompSci-61B Lab 8, Stacks and Queues
The purpose of having special classes for these adaptations is to make it easier on ... Each wait queue has an upper limit...
Read more >
6.2. Blocking I/O - Make Linux Software
This section shows how to put a process to sleep and wake it up again later on. ... In Linux, a wait queue...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found