AssertionError crashes in PoolArena
See original GitHub issueHello!
My team at AWS has a service that uses Netty 4.1. Ever since we upgraded from 4.1.48 to 4.1.52, we’ve been seeing occasional process (once every week or two) crashes due to AssertionErrors.
Here’s the Netty portion of the stack trace:
java.lang.AssertionError
| at io.netty.buffer.PoolArena.tcacheAllocateSmall(PoolArena.java:165)
| at io.netty.buffer.PoolArena.allocate(PoolArena.java:136)
| at io.netty.buffer.PoolArena.allocate(PoolArena.java:128)
| at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:378)
| at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:187)
| at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:178)
| at io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:115)
We suspect that this is a bug introduced with this commit, it’s the only significant commit between 4.1.48 and 4.1.52 with relevant scope.
This is the failing assertion: https://github.com/netty/netty/blob/4.1/buffer/src/main/java/io/netty/buffer/PoolArena.java#L163
So the failure is either that the found subpage is marked for deletion, or the subpage’s size is not as expected by sizeclasses. I’m leaning more towards the second possibility, since I think this code makes sure that the first subpage cannot be marked for deletion; but I may be missing something.
Expected behavior
N/A
Actual behavior
N/A
Steps to reproduce
Unfortunately, we have not been able to find a repro; or any other indicators (metrics and logs) that predict the crash.
Minimal yet complete reproducer code (or URL to code)
N/A
Netty version
4.1.52
JVM version (e.g. java -version
)
openjdk version “11.0.13” 2021-10-19 LTS OpenJDK Runtime Environment (build 11.0.13+8-LTS) OpenJDK 64-Bit Server VM (build 11.0.13+8-LTS, mixed mode)
OS version (e.g. uname -a
)
5.4.156-94.273.amzn2int.x86_64
Issue Analytics
- State:
- Created 2 years ago
- Comments:14 (10 by maintainers)
Top GitHub Comments
Thank you, we managed to get 4.1.73 imported and deployed on our end, so far it looks promising, no reoccurrence. I’ll update this thread once we’ve pushed the change to production, if we’ve still not seen any more of this issue then I think it can be closed out as having been fixed by 4.1.7x
Sounds like it’s safe to close now that it’s not reproducible anymore since the upgrade, correct?