Native image memory leaks?
See original GitHub issueDescribe the bug
Hi Quarkus team & community!
While taking a look again on the recent developments on Quarkus space and started reading a recently published book, initially I played with the native image by just keep pressing F5 (browser reload) on a simple GET /accounts and discovered that the memory was keep increasing.
So, just being genuinely curious about it, I started to do it in a more “standard” way, but simple enough to illustrate the point. See How to Reproduce section for all the figures.
The app is a bare bone simple service that has an API and returns some data from memory, no disk nor database. The complete example can be found here.
The “stress test” is done using cURL and it is this /tests/account_svc_local_stress_1.sh file. In all ll the screenshots below you’ll see there is no bottleneck anywhere else (script nor system).
Obviously, a native image is not that efficient as a JVM based running app, but still the concerning part is that it keeps increasing the memory usage although there are no real reasons to do it except of some memory leak in the framework internals or the JAX-RS implementation. And most probably the only solution for now would be to recycle the instances from time to time.
Expected behavior
No response
Actual behavior
No response
How to Reproduce?
Initially:
$ ./target/account_svc-1.0.0-SNAPSHOT-runner
__ ____ __ _____ ___ __ ____ ______
--/ __ \/ / / / _ | / _ \/ //_/ / / / __/
-/ /_/ / /_/ / __ |/ , _/ ,< / /_/ /\ \
--\___\_\____/_/ |_/_/|_/_/|_|\____/___/
2022-01-09 15:08:06,648 INFO [io.quarkus] (main) account_svc 1.0.0-SNAPSHOT native (powered by Quarkus 2.6.1.Final) started in 0.034s. Listening on: http://0.0.0.0:8080
2022-01-09 15:08:06,648 INFO [io.quarkus] (main) Profile prod activated.
2022-01-09 15:08:06,648 INFO [io.quarkus] (main) Installed features: [cdi, kubernetes, resteasy, resteasy-jsonb, smallrye-context-propagation, vertx]
It starts quickly and the memory usage is very low. Of course, it’s not that relevant in a grand perspective, but that’s the “starting point”.
After running a first test using ./account_svc_local_stress_1.sh -a http://localhost:8080/accounts
the result is:
After running ./account_svc_local_stress_1.sh -r 10000 -a http://localhost:8080/accounts
Continue with another test that would take longer by sending 1 million requests using
./account_svc_local_stress_1.sh -r 1000000 -a http://localhost:8080/accounts
.
During this one, sometimes the CPU usage increases a little bit for short durations (like 3 sec) from ~20 to ~26%, while the memory still increases in a very small steps but kinda consistent.
And the result is:
Running again the same 1M requests test and the result is:
Output of uname -a
or ver
Linux dxps 5.15.8-76051508-generic #202112141040~1639505278~21.10~0ede46a SMP Tue Dec 14 22:38:29 U x86_64 x86_64 x86_64 GNU/Linux
Issue Analytics
- State:
- Created 2 years ago
- Comments:12 (11 by maintainers)
Top GitHub Comments
With native image, there is only a limited choices of GC: Serial GC (default), Epsilon GC (no-op), and G1 GC (GraalVM EE only). From the setup:
-Xmn
config -> Young generation size is256M
-Xmx
config -> 80% of the physical memory, very largeBased on the usage pattern, most of the objects are short-lived. The increase of memory size is caused by object allocations in the eden region of the young generation before reaching the limit (256M). After turning on the GC output (
-XX:+PrintGC -XX:+VerboseGC
), I can observe that when the young generation is full, a young collection can collect most of the space. After reaching the limit of young generation, the overall memory size won’t increase much.When the size of young generation is reduced, like
-Xmn32m
, the overall memory size can also be reduced. So this is unlikely to be a memory leaking issue.@galderz Okay, okay, I explicitly mentioned that I used Java 17, instead of java 11. Should I almost feel guilty of pretending to compare apples to apples? 😐 I’m not here to bash Quarkus, I even promoted internally to my employer and it’s being used for around 1.5 years now. Initially, I just wanted to raise a heads up about this issue here, and see if the community has a feedback. Ofc, I could use Eclipse MAT or something to analyze a heapdump taken during or after a test. I’ll try to find and spend some time understanding the why.
@alexcheng1982 Excellent insight! Thanks for sharing it!
Started another “primitive” testing (previously described) scenario in JVM mode using a Java 11, as the initial native image used during the build. The Java version being used is
11.0.13+8-Ubuntu-0ubuntu1.21.10
.At fresh startup time (without any request), it looks like this:
1st set of 1M requests After 4-7 seconds, the memory usage quickly jumps up to
447.7 MB
and it stays there for the rest of the test (that normally takes 6m 30s to complete). And the result:2nd set of 1M requests After 4-7 seconds, the memory usage jumps up to
461.4 MB
and it kinda stays there for the rest of the test. The result is:3rd set of 1M requests After 4-7 seconds, the memory usage jumps up to
467.6 MB
and it kinda stays there for the rest of the test. The result being:I could continue, but I guess the pattern is predictable.
Now just taking another shot with the native image and some
-Xmx
, as rightly suggested by @gsmet and @alexcheng1982. Used./target/account_svc-1.0.0-SNAPSHOT-runner -Xmx64m
.5.1
MB.11 - 16
MB range.Without any other relevant actions, we could consider this issue as closed and get the lessons learned from it. Thanks again for the valuable feedback! 🙏