Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

okhttp fails with IOException: gzip finished without exhausting source but GZIPInputStream works

See original GitHub issue

What kind of issue is this?

Bug report. If you’ve found a bug, spend the time to write a failing test. Bugs with tests get fixed. Here’s an example: https://gist.github.com/swankjesse/981fcae102f513eb13ed

This issue can’t be reproduced in a test. I’ll do my best to explain.

>> GET http://myserver.mycompany.com/.../businesses.20180104.json.gz
<< 200 OK
<< connection -> [keep-alive]
<< accept-ranges -> [bytes]
<< content-disposition -> [attachment; filename="businesses.20180104.json.gz"; filename*=UTF-8''businesses.20180104.json.gz] 
<< content-type -> [application/x-gzip]
<< content-length -> [3384998203]
<< date -> [Fri, 05 Jan 2018 00:43:32 GMT]
<< etag -> [0e49d5fa7ba9f68058bfbb4a98bef032c3a73871]
<< last-modified -> [Thu, 04 Jan 2018 23:54:26 GMT]
<< x-artifactory-id -> [9732f56568ea1e3d:59294f65:160b8066066:-8000]
<< x-checksum-md5 -> [451ca1b1414e7b511de874e61fd33eb2]
<< x-artifactory-filename -> [businesses.20180104.json.gz]
<< server -> [Artifactory/5.3.0]
<< x-checksum-sha1 -> [0e49d5fa7ba9f68058bfbb4a98bef032c3a73871]

As you can see, the server doesn’t set a Content-Encoding = gzip header, so I do that in an interceptor.

Each record is newline delimited JSON string that is inserted into Couchbase. There are around 12 million records in total. Using okhttp, processing fails after about 130000 with the following exception:

Caused by: java.io.IOException: gzip finished without exhausting source
	at okio.GzipSource.read(GzipSource.java:100)
	at okio.RealBufferedSource$1.read(RealBufferedSource.java:430)

However, if I don’t set the Content-Encoding header (thus skipping GzipSource), and wrap the input stream with GZIPInputStream, everything works as expected. I’ve also tried setting Transfer-Encoding = chunked on the response and removing the Content-Length header, but to no avail.

So, question is, if GZIPInputStream doesn’t have a problem, why does GzipSource? And since it does, why won’t it report what it thinks is the issue? I’ve a test that runs on a smaller file, 100 records, and it works.

I’ve seen https://github.com/square/okhttp/issues/3457, but unlike the reporter, it’s not possible for me to capture the hex body of a 3.4 GB stream.

Issue Analytics

State:
Created 6 years ago
Reactions:3
Comments:44 (12 by maintainers)

Top GitHub Comments

1reaction

swankjessecommented, Jan 7, 2018

I introduced this behavior and can explain it.

Gzip is a self-terminating format. The content of the stream itself indicates where you’ve read everything.

If ever there’s data beyond the self-reported end, this data is effectively unreachable. This is potentially problematic for two reasons:

HTTP/1 connection pooling. If we don’t consume the entire response body of call N, we can’t use the connection for call (N+1).
HTTP response caching. We only persist response values once they’re completely downloaded.

I made things strict to help to detect problems like this. It’s possible this check is too strict and we should silently ignore the extra data.

1reaction

asarkarcommented, Jan 6, 2018

@yschimke I think there was a mistake in how I took the previous hex dump. This time, I did the following:

1. protected void decode(ChannelHandlerContext ctx, ByteBuf in, List<Object> out) throws Exception {
2.    if (finished) {
3.        in.skipBytes(in.readableBytes());
4.        return;
5.    }
    ...
}

Put a breakpoint on line 3 of JdkZlibDecoder.decode above. For every invocation of it, dump the contents of the ByteBuf to a file by manually invoking the following method that I wrote: ByteBufUtils.dumpByteBuf("yelp-dump.txt", in)

public static void dumpByteBuf(String out, ByteBuf msg) {
    StringBuilder buf = new StringBuilder(StringUtil.NEWLINE);
    appendPrettyHexDump(buf, msg);

    try (BufferedWriter w = newBufferedWriter(Paths.get(out), UTF_8, APPEND, CREATE)) {
        w.write(buf.toString());
    } catch (IOException e) {
        throw new UncheckedIOException(e);
    }
}

That produced the attached dump, and I see that it starts with 1f 8b, as well contains the same sequence more than once. Does this prove my theory of multiple streams?

yelp-dump.txt

Top Results From Across the Web

okhttp fails with IOException: gzip finished without exhausting ...

okhttp fails with IOException: gzip finished without exhausting source but GZIPInputStream works.

gzip finished without exhausting source, about Okhttp,okio

It has nothing to do with retrofit/OkHttp. In fact, it seems that the problem was that the server code (not Apache) was always...

Dan Lew on Twitter: "Downloading a URL to a file is easy with ...

github.com. okhttp fails with IOException: gzip finished without exhausting source but GZIPInputStream works ·... What kind of issue is this? Bug report.

Diff - platform/external/okhttp - Google Git

OkHttp continues to track which + routes have failed but this is no exposed in the API. ... private byte[] gzip(byte[] bytes) throws...

HttpURLConnection - Android Developers

Otherwise HttpURLConnection will be forced to buffer the complete request body in ... Returns the error stream if the connection failed but the...