question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Non-deterministic premature EOF

See original GitHub issue

Story

I have a service that is

  1. listening to a stream of request bodies
  2. querying a third party http server using blazeclient
  3. forwarding the responses further down the stream

90% of the requests are fine while 10% fails with org.http4s.InvalidBodyException: Received premature EOF, where the 10% is not tied to particular requests, so if I retry the same requests there’s 90% chance they’d pass.

Reproduction

I managed to reproduce the issue in a controlled environment.

  1. I emulate the stream of request bodies, by keep emmiting a single static request payload val body = "x".repeat(requestPayloadSize) at a fixed rate Stream.fixedRate

  2. query the local test server val req = Request[IO](POST, uri).withEntity(body)simpleClient.stream.flatMap(c => c.stream(req)).flatMap(_.bodyText) that responds with a static payload val response = "x".repeat(responsePayloadSize)case POST -> Root => Ok(response)

  3. finally I print the index of the request along with the chunk size .evalMap(c => IO.delay(println(s"$i ${c.size}")))

Conclusion

Based on some extended experiment I managed to find some magic numbers for request and response payload sizes. Below these payload sizes I can run the app for an extended period without any exceptions, while if both request and response sizes reach these thresholds the client will eventually throw an EOF exception.

  // Numbers below vary on different computers
  // In my case, if the request payload size is 32603 or greater
  //  AND response payload size is 81161 or greater
  //  then we get EOF exception in some but not all cases
  // If however either of these payload sizes is lower then
  //  EOF exception doesn't occur, even if running for an extended period

Notes

You can find the test project at https://github.com/slve/http4s-eof, there the only scala source in https://github.com/slve/http4s-eof/blob/master/src/main/scala/Http4sEof.scala.

Using fs2 2.5.0, http4s 0.21.18. https://github.com/slve/http4s-eof/blob/master/build.sbt

The server part is only there to aid the testing, but regardless of what server you’d run your test with, the client will eventually drop an EOF exception in a short period.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
eugene-cheverdacommented, Nov 23, 2021

Hi @rossabaker

We’re facing EOF error in one of our services and I’m using the project provided by @slve to reproduce the issue. Please see the findings below:

  1. With blaze client, if request size is bigger than 65397 bytes and response is about 1Mb first request fails with EOF/Broken Pipe/Connection reset by peer regardless of time the app was running, it happens almost immediately. Number 65397 is 65535 - default request headers length. I’m testing it with 70000 bytes request size and 1000000 response.
  2. With turned on tracing logs and debugging I was able to figure out that request itself streams fine, but exception occurs when streaming back response. See point 3 below.
  3. It seems like there is some sort of race in Http1Stage.scala lines 219-230. If I set a breakpoint on line currentBuffer = BufferTools.concatBuffers(currentBuffer, b) simulating delay then test passes. Without breakpoint fails almost immediately into cb(eofCondition()). Prior falling into eofCondition Http1Stage.drainBody function is called and logs HTTP body not read to completion. Dropping connection.
channelRead().onComplete {
  case Success(b) =>
    currentBuffer = BufferTools.concatBuffers(currentBuffer, b)
    go()

  case Failure(Command.EOF) =>
    cb(eofCondition())

  case Failure(t) =>
    logger.error(t)("Unexpected error reading body.")
    cb(Either.left(t))
}

Could you please take a look at the issue again and provide some updates or estimates on how soon it could be fixed?

Thanks in advance!

0reactions
slvecommented, Feb 12, 2021

Awesome, thank you @rossabaker. Just to note, I’ve had other two experiments in different branches, one is using http4s/jdk-http-client while the other is using softwaremill/sttp and ran into similar issues.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Non-deterministic premature EOF · Issue #637 · http4s/blaze
Story I have a service that is listening to a stream of request bodies querying a third party http server using blazeclient forwarding...
Read more >
Explaining and preventing Premature EOF errors when ...
The premature EOF exception generally occurs in situations where a connection to a particular application server connection was lost while ...
Read more >
693953 – error: unpacking of archive failed: cpio: read failed
cpio: premature end of file But this is also non-deterministic. Sometimes (but rarely) cpio succeeds without the premature eof message.
Read more >
nextflow-io/nextflow - Gitter
Hi. We have been experiencing non-deterministic errors with our pipelines on LSF, where processes will fail with a 141 exit code (SIGPIPE). The...
Read more >
Sporadic guix-offload crashes due to EOF errors
guix offload: got premature EOF from machine 'overdrive1.guix.gnu.org' from ... non-deterministic bugs (of the kind that keeps you busy for weeks) and.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found