Memory usage in long running process
See original GitHub issueNEST/Elasticsearch.Net version: 6.8.1
Elasticsearch version: 6.8.16
Description of the problem including expected versus actual behavior:
Using https://github.com/amccool/AM.Extensions.Logging.ElasticSearch (which depends on https://github.com/elastic/elasticsearch-net ) in a long running service. Logging a significant volume of data using the Bulk API.
Depending on the load on the system memory usage increases. Memory Profiling shows that Elasticsearch.Net.RecyclableMemoryStreamManager.GetBlock()
is the method that has allocated 1.28 Gb of byte[]
type:
Steps to reproduce:
- Start a process that will utilize the
ElasticLowLevelClient.BulkPutAsync
mutiple times - let it run, and take snapshot of the memory every hour and see the LOH and POH grow
- watch for the byte[] allocated to decrease…no decease
Expected behavior the allocated byte[] from ElasticLowLevelClient / RecyclableMemoryStreamManager should stablize.
Provide ConnectionSettings
(if relevant):
Provide DebugInformation
(if relevant):
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
Hi @andreycha.
Great question! I don’t think we have much to explain this topic. In 7.14.0 I have switched the default behaviour and added an item to the release notes.
With the pooling behaviour (current default until 7.14.0) large memory streams are pooled and reused to avoid repeated allocations. This can help particularly for apps which often deal with large responses (lots of docs etc). However, if those large responses are infrequent, those pooled streams may just sit in memory and not help much. We felt it’s better to default to not pooling these as for many app profiles this is less surprising since we don’t retain memory for long periods. The trade off is more allocations and therefore potentially more GCs. However, .NET is designed to handle those well for many app profiles.
The best advice would be to profile your app using both settings under normal (ideally production) load for several days each. That should give you data to determine which mode best suits your data and usage patterns. You’d want to measure overall memory use, particularly gen 2 and ideally GC counts. The newer .NET counters provide some good data you can subscribe to. It would also be important to measure metrics for critical paths in the app to ensure things like request counts and latency perform well. Turning off pooling may introduce more pauses for GCs so could impact other key metrics.
I’d certainly recommend 7.13.2 as it includes some performance improvements in recent releases as well as quite a lot of bug fixes. All 7.x releases are tested against all prior 7.x server versions to ensure they remain backwards compatible, so we always recommend being on the latest version. When 7.14.0 lands, I’ve made that pooling behaviour default change and also a small performance improvement per request. I’m putting a lot of focus on performance for v8 as well with a few to reducing allocations throughout the client.
In extreme high-performance cases, there is also the low level client which requires much more from the consumer, but can be used to avoid much of the overhead that the convenience of the high level NEST client introduces. In most cases that should not be necessary though.
Finally, if you’re in a position to do so, I’d welcome any memory dumps of long running, high memory apps that consumers are able to provide so I can review those to target further performance analysis. If that’s something you’d be allowed and able to provide I can provide an email to send them to!
Hopefully this answer helps clear up some of the mystery. I’ll certainly review whether the documentation can be enhanced to include some of this too!
Hi @stevejgordon,
Is there any documentation where I can read in details about this setting and the tradeoff you mentioned (consistent memory use vs smaller allocation)? We’re using NEST 7.6.1 and sometimes
RecyclableMemoryStreamManager
consumes 2-3 Gb of memory. I was lurking in Google and came across this fresh issue, which seems to be quite related.Would it probably help to upgrade to the latest 7.13.2, were there any optimizations done in the recent year?
Thank you!