Controller increasing memory consumption and crash
See original GitHub issueProblem description
We executed a battery of longevity runs in a small cluster (3 nodes) under a light write/read workload (mediumScale
) with Pravega 0.3.3
. As can be observed in the figure below, the Controller process (dotted red line) is slowly consuming memory along time:
Interestingly, these experiments do not consist of managing multiple Streams or heavily work with Transactions; they are mainly IO operations, so the Controller workload should be limited.
Problem location Controller.
Suggestions for an improvement Profile the memory consumption of Controller to detect a possible memory leak.
Issue Analytics
- State:
- Created 5 years ago
- Comments:9 (9 by maintainers)
Top Results From Across the Web
Memory leak by Controller container in SQL Server 2019 BDC ...
In addition, the container that hosts the controller may crash as memory usage increases.
Read more >Domain Controller crash - Memory leaking process - Jigsolving
Domain Controller crash - A quick way to possibly identify memory leaks is to include Process > Private Bytes > All Instances counter(s)...
Read more >High Memory Usage after helm-controller v0.12.0 upgrade #345
I now had to change all ReconcileStrategy to ChartVersion because the high memory utilization made our controller nodes unresponsive and ...
Read more >Terminated due to out of memory crashes - Apple Developer
Hi,. I'm building an ios app for iPad (swift 3) and keep getting crashes with the message 'Terminated due to out of memory'....
Read more >Memory usage keeps increasing - ios - Stack Overflow
I haven't been able to use the Debug Memory Graph as it keeps crashing Xcode... Main View Controller import UIKit class TableViewController: ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Update: curator doesnt have a problem as assumed earlier… The problem is with the
ExponentialBackoffRetry
policy we use for retries in curator client. The value we supply is 500ms as base and 10 retries. The logic for exponential retry in curator is as follows:So in worst case this sleepMs across 10 retries would be: 500 * 2 + 500 * 4 + 500 * 8 + … + 500 * 1024 ~= 2048 * 500ms ~=
1000 seconds
.The moment we reduce our input parameters for retry we get the curator calling the callback method and grpc call completing.
the reason for controller service to take a very long time to shutdown is following: we use grpc.shutdown which waits for all ongoing grpc requests to complete before shutting down grpc service.
There is an ongoing grpc call, in this case
updateStream
which hasnt completed. The reason for its failure to complete is on the curator though.following is the pattern we use for making zk calls in the store:
If zk session has expired, the curator sends interrupt to the background work. this doesnt result in the callback being invoked.
If i remove the asynchronous curator client call with a synchronous call then the curator immediately throws IllegalStateException and our future is completed and grpc returns failure to the caller.