Getting java.lang.OutOfMemoryError: Direct buffer memory
See original GitHub issueLettuce version: 5.0.2.RELEASE Reproducible in Linux (Kubernetes), Windows (my local machine), likely everywhere
I’ve started testing Redis cluster in Kubernetes. So far – not bad, all failover scenarios worked fine, but there was one big problem – memory leaks. It was not evident for me at first (because it was direct memory leak and I was looking at heap charts), but I think I tackled two cases.
First one is topology refresh. I have single node redis cluster (redis-cluster
) in docker compose for local testing.
With this options:
ClusterTopologyRefreshOptions.builder()
.enablePeriodicRefresh(Duration.ofSeconds(2)) // anything will do, but small value will lead to exception faster
.enableAllAdaptiveRefreshTriggers()
.build()
And small direct memory size, e.g. -XX:MaxDirectMemorySize=100M or 200M, I can get OOM exception in 1-2 minutes. Exception looks like this:
2018-02-17 13:33:17.243 [WARN] [lettuce-eventExecutorLoop-63-8] [i.l.c.c.t.ClusterTopologyRefresh] - Cannot retrieve partition view from RedisURI [host='redis-cluster', port=7000], error: java.util.concurrent.ExecutionException: java.lang.NullPointerException
2018-02-17 13:33:17.243 [WARN] [lettuce-nioEventLoop-65-3] [i.l.c.p.CommandHandler] - null Unexpected exception during request: java.lang.NullPointerException
java.lang.NullPointerException: null
at io.lettuce.core.protocol.CommandHandler.channelRead(CommandHandler.java:500) ~[lettuce-core-5.0.2.RELEASE.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1414) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:945) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:141) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:886) [netty-common-4.1.21.Final.jar:4.1.21.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.21.Final.jar:4.1.21.Final]
at java.lang.Thread.run(Thread.java:844) [?:?]
Looks like Netty is out of direct memory.
Second case I’m less sure, I did not do extensive testing, but I think they are connected. I have 7 redis node cluster in our Kubernetes environment. We killed one master to see if it will failover. It did, topology refreshed, everything seemed OK. But in background Lettuce kept pinging/trying to connect to dead node (only seen when turned on Lettuce debug log), and direct memory quickly dried up and node died.
Any thoughts?
Issue Analytics
- State:
- Created 6 years ago
- Comments:40 (13 by maintainers)
Top GitHub Comments
@mp911de Tried my test in https://github.com/vleushin/lettuce-oom . Worked without problems.
Tried with 5.0.2. Indeed, connection count to Redis was growing. With 5.0.3 it was stable.
I think we found the cause. The OOME looks related to a connection leak reported in #721.