question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

okhttp fails with IOException: gzip finished without exhausting source but GZIPInputStream works

See original GitHub issue

What kind of issue is this?

This issue can’t be reproduced in a test. I’ll do my best to explain.

>> GET http://myserver.mycompany.com/.../businesses.20180104.json.gz
<< 200 OK
<< connection -> [keep-alive]
<< accept-ranges -> [bytes]
<< content-disposition -> [attachment; filename="businesses.20180104.json.gz"; filename*=UTF-8''businesses.20180104.json.gz] 
<< content-type -> [application/x-gzip]
<< content-length -> [3384998203]
<< date -> [Fri, 05 Jan 2018 00:43:32 GMT]
<< etag -> [0e49d5fa7ba9f68058bfbb4a98bef032c3a73871]
<< last-modified -> [Thu, 04 Jan 2018 23:54:26 GMT]
<< x-artifactory-id -> [9732f56568ea1e3d:59294f65:160b8066066:-8000]
<< x-checksum-md5 -> [451ca1b1414e7b511de874e61fd33eb2]
<< x-artifactory-filename -> [businesses.20180104.json.gz]
<< server -> [Artifactory/5.3.0]
<< x-checksum-sha1 -> [0e49d5fa7ba9f68058bfbb4a98bef032c3a73871]

As you can see, the server doesn’t set a Content-Encoding = gzip header, so I do that in an interceptor.

Each record is newline delimited JSON string that is inserted into Couchbase. There are around 12 million records in total. Using okhttp, processing fails after about 130000 with the following exception:

Caused by: java.io.IOException: gzip finished without exhausting source
	at okio.GzipSource.read(GzipSource.java:100)
	at okio.RealBufferedSource$1.read(RealBufferedSource.java:430)

However, if I don’t set the Content-Encoding header (thus skipping GzipSource), and wrap the input stream with GZIPInputStream, everything works as expected. I’ve also tried setting Transfer-Encoding = chunked on the response and removing the Content-Length header, but to no avail.

So, question is, if GZIPInputStream doesn’t have a problem, why does GzipSource? And since it does, why won’t it report what it thinks is the issue? I’ve a test that runs on a smaller file, 100 records, and it works.

I’ve seen https://github.com/square/okhttp/issues/3457, but unlike the reporter, it’s not possible for me to capture the hex body of a 3.4 GB stream.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:3
  • Comments:44 (12 by maintainers)

github_iconTop GitHub Comments

1reaction
swankjessecommented, Jan 7, 2018

I introduced this behavior and can explain it.

Gzip is a self-terminating format. The content of the stream itself indicates where you’ve read everything.

If ever there’s data beyond the self-reported end, this data is effectively unreachable. This is potentially problematic for two reasons:

  • HTTP/1 connection pooling. If we don’t consume the entire response body of call N, we can’t use the connection for call (N+1).
  • HTTP response caching. We only persist response values once they’re completely downloaded.

I made things strict to help to detect problems like this. It’s possible this check is too strict and we should silently ignore the extra data.

1reaction
asarkarcommented, Jan 6, 2018

@yschimke I think there was a mistake in how I took the previous hex dump. This time, I did the following:

1. protected void decode(ChannelHandlerContext ctx, ByteBuf in, List<Object> out) throws Exception {
2.    if (finished) {
3.        in.skipBytes(in.readableBytes());
4.        return;
5.    }
    ...
}

Put a breakpoint on line 3 of JdkZlibDecoder.decode above. For every invocation of it, dump the contents of the ByteBuf to a file by manually invoking the following method that I wrote: ByteBufUtils.dumpByteBuf("yelp-dump.txt", in)

public static void dumpByteBuf(String out, ByteBuf msg) {
    StringBuilder buf = new StringBuilder(StringUtil.NEWLINE);
    appendPrettyHexDump(buf, msg);

    try (BufferedWriter w = newBufferedWriter(Paths.get(out), UTF_8, APPEND, CREATE)) {
        w.write(buf.toString());
    } catch (IOException e) {
        throw new UncheckedIOException(e);
    }
}

That produced the attached dump, and I see that it starts with 1f 8b, as well contains the same sequence more than once. Does this prove my theory of multiple streams?

yelp-dump.txt

Read more comments on GitHub >

github_iconTop Results From Across the Web

okhttp fails with IOException: gzip finished without exhausting ...
okhttp fails with IOException: gzip finished without exhausting source but GZIPInputStream works.
Read more >
gzip finished without exhausting source, about Okhttp,okio
It has nothing to do with retrofit/OkHttp. In fact, it seems that the problem was that the server code (not Apache) was always...
Read more >
Dan Lew on Twitter: "Downloading a URL to a file is easy with ...
github.com. okhttp fails with IOException: gzip finished without exhausting source but GZIPInputStream works ·... What kind of issue is this? Bug report.
Read more >
Diff - platform/external/okhttp - Google Git
OkHttp continues to track which + routes have failed but this is no exposed in the API. ... private byte[] gzip(byte[] bytes) throws...
Read more >
HttpURLConnection - Android Developers
Otherwise HttpURLConnection will be forced to buffer the complete request body in ... Returns the error stream if the connection failed but the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found