java.util.zip.DataFormatException: invalid block type in JdkZlibDecoder for large gzip stream
See original GitHub issueWhen downloading a large gzip stream, Inflater throws a DataFormatException when using HttpContentDecompressor, but not when using a GZIPInputStream.
Expected behavior
No exception.
Actual behavior
Exception in thread "main" io.netty.handler.codec.compression.DecompressionException: decompression failure
at io.netty.handler.codec.compression.JdkZlibDecoder.decode(JdkZlibDecoder.java:273)
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.embedded.EmbeddedChannel.writeInbound(EmbeddedChannel.java:343)
at io.netty.handler.codec.http.HttpContentDecoder.decode(HttpContentDecoder.java:264)
at io.netty.handler.codec.http.HttpContentDecoder.decodeContent(HttpContentDecoder.java:171)
at io.netty.handler.codec.http.HttpContentDecoder.decode(HttpContentDecoder.java:160)
at io.netty.handler.codec.http.HttpContentDecoder.decode(HttpContentDecoder.java:47)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at com.nextlisten.commoncreeper.Temp3$1.channelRead(Temp3.java:27)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296)
at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1368)
at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1234)
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1280)
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:831)
Suppressed: java.lang.Exception: #block terminated with an error
at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:99)
at reactor.core.publisher.Mono.block(Mono.java:1703)
at com.nextlisten.commoncreeper.Temp3.main(Temp3.java:38)
Caused by: java.util.zip.DataFormatException: invalid block type
at java.base/java.util.zip.Inflater.inflateBytesBytes(Native Method)
at java.base/java.util.zip.Inflater.inflate(Inflater.java:378)
at io.netty.handler.codec.compression.JdkZlibDecoder.decode(JdkZlibDecoder.java:240)
... 56 more
Full logs: logs.txt
Steps to reproduce
- Install a handler that overrides the
Content-encoding
header togzip
, and theContent-type
header totext/plain
- Install
HttpContentDecompressor
as a handler after that - Go download a small gzip file (e.g. 845 bytes compressed: https://commoncrawl.s3.amazonaws.com/crawl-data/CC-MAIN-2021-25/cc-index.paths.gz ). That will work.
- Go download a large/huge gzip file (e.g. 740 MB compressed, 5.2 GB uncompressed: https://commoncrawl.s3.amazonaws.com/cc-index/collections/CC-MAIN-2021-25/indexes/cdx-00000.gz) and it will throw a DataFormatException.
To verify that the stream itself isn’t corrupt, we can do the following:
- Get body as an inputStream and wrap it in a GZIPInputStream
- Reading the entire stream from the GZIPInputStream is successful.
Minimal yet complete reproducer code (or URL to code)
This code fails:
public class Temp3 {
public static final void main(final String... args) {
final HttpClient httpClient = HttpClient.create()
.doOnConnected(c -> {
c.addHandlerFirst(new HttpContentDecompressor());
c.addHandlerFirst(new ChannelInboundHandlerAdapter() {
@Override
public void channelRead(final ChannelHandlerContext ctx, final Object msg) {
if (msg instanceof HttpResponse response && response.status().code() >= 200) {
response.headers()
.set(HttpHeaderNames.CONTENT_ENCODING, HttpHeaderValues.GZIP)
.set(HttpHeaderNames.CONTENT_TYPE, HttpHeaderValues.TEXT_PLAIN);
}
ctx.fireChannelRead(msg);
}
});
})
.compress(true);
httpClient
.get()
.uri("https://commoncrawl.s3.amazonaws.com/cc-index/collections/CC-MAIN-2021-25/indexes/cdx-00000.gz")
.responseSingle((httpClientResponse, byteBufMono) -> byteBufMono.asString(StandardCharsets.UTF_8))
.doOnNext(s -> System.out.println(s.substring(0, 1000)))
.block();
}
}
This code succeeds:
public class Temp4 {
public static final void main(final String... args) throws IOException {
final HttpClient httpClient = HttpClient.create();
final InputStream is = httpClient.get()
.uri("https://commoncrawl.s3.amazonaws.com/cc-index/collections/CC-MAIN-2021-25/indexes/cdx-00000.gz")
.responseSingle((httpClientResponse, byteBufMono) -> byteBufMono.asInputStream())
.block();
try (final GZIPInputStream gis = new GZIPInputStream(is);
final CountingOutputStream cos = new CountingOutputStream()) {
gis.transferTo(cos);
System.out.println(cos.size);
} finally {
is.close();
}
}
private static class CountingOutputStream extends OutputStream {
private long size = 0;
@Override
public void write(final int b) throws IOException {
size++;
}
}
}
Netty version
- netty-codec-http: 4.1.65.Final
- netty-transport: 4.1.65.Final
- reactor-netty-http: 1.0.8
JVM version (e.g. java -version
)
openjdk version “16” 2021-03-16 OpenJDK Runtime Environment (build 16+36-2231) OpenJDK 64-Bit Server VM (build 16+36-2231, mixed mode, sharing)
OS version (e.g. uname -a
)
Windows 10 Pro 21H1 (build 19043.1083)
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (10 by maintainers)
Top Results From Across the Web
compress.inflate: DataFormatException: invalid block type #851
I'm trying to decompress a gzip'ed file: $ file /path/to/file.csv.gz ... DataFormatException: invalid block type java.util.zip.Inflater.
Read more >java.util.zip.ZipException: invalid block type under JDK1.4.2
The reason was that the GZIPInputStream constructor consumes about 10 bytes. If you do a reset before the first read, under 1.4.2 those...
Read more >GZip gives error: java.util.zip.ZipException: invalid block type ...
I am getting the following error : java.util.zip.ZipException: invalid block type at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:164) at ...
Read more >java.util.zip.ZipException: invalid block… - Apple Community
java.util.zip.ZipException: invalid block type. Hi,. We are keep getting this exception when we use mac High Sierra OS.
Read more >java.util.zip.DataFormatException: invalid block type - Apache
Description. it gives "java.util.zip.DataFormatException: invalid block type" error while decompressing the stream using Inflater in FlateFilter.java.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
This is fixed by https://github.com/netty/netty/pull/11521 … I am currently looking into writing a test-case for it.
After a quick look my guess is that we mess with indexes when we set more input for inflator. There is a risk that we push the same bytes twice if indexes are moved incorrectly between multiple
decode
invocations.I did check that
JZlibDecoder
works fine with this input. So, the workaround will be to use-Dio.netty.noJdkZlibDecoder=true
system property.JdkZlibDecoder
requires deeper debugging.