Bug: <Memory Leak persist after update 1.13.1 >
See original GitHub issueBug Description
Hi Michael, I’m using RepoDb.PostgreSql.BulkOperations version 1.13.1, this application parse a lot of CDR’s files and insert in a PostgreSql database, but when my application execute this line:
await db.InsertAllAsync( tableName: "schema.table", recordsFinal, batchSize: insertBuffer, transaction: transaction );
Memory usage increase and never decrease, No matter if I force to GC to collect and clean. This application runs in a Debian Linux Server with 32 GB and postgresql 13.x, I’m using the latest version of RepoDb.PostgreSql.BulkOperations and dotnet core 6.0. When the applications usage gets the 32 GB of RAM, the server starts to swap and the application stops working.
And it seems that it is incremental, for example it starts with 100 MB and for each file it processes it increases exponentially 200MB, 400 MB… 3 GB, etc.
Images
Library Version: Version: RepoDb.PostgreSql.BulkOperations version 1.13.1
Issue Analytics
- State:
- Created 6 months ago
- Comments:7 (7 by maintainers)
Top GitHub Comments
I have investigated this and it sounds to me that it works as expected. But, you as a user must know the caveat of it. In short, this is not a memory leaks!
RepoDB requires these caching for it to perform a more performant insertion of the batch operations.
Explanation
Below is the screenshot and a very small project that we used for simulation and replication. The project below requires a SQL Server as we did the simulation there. Unfortunately for SQL Server and MDS, it only allow a maximum number of 2100 parameters. Therefore, you will see that I can only hit the max of
20 batchSize
for20 rows
with50 columns data entities
.Project:
This project is good for small simulation on this kind of issue. InsertAllMemoryLeaks-InsertAll.zip
Screenshot:
What the program does?
[dbo].[BigTable]
with 120 columns.InsertAll
operation.BigTable
entity with max 50 columns. The creation of the list will vary on the maximum batchSize, from 1. This is to ensure that RepoDB will create and cache each buffer’s command text in the memory.BigTable
entity towards the table.Observation
Fluctuations:
In the first few seconds, the memory has fluctuated a lot, it is because when the
INSERT
statement is being created for the number of rows given, the library will put that in the cache.Truth: If you insert 1 row, it will create 1
INSERT
statement and cache it. If you insert 2 rows, it will create 2INSERT
statements and cache it, and so forth.The
batchSize
is the maximum number ofINSERT
statement it will create and cache into the memory. So in short, you will have 20 timesINSERT
statement being cached into the memory (in which each of them will have different number of parameters based on the columns provided on the data entities).Flat-Lines:
You will notice the flat-lines after those 20
INSERT
statement has been cached to the memory. This is because the library is not creating anINSERT
statement anymore, instead, simply reusing the one in the cache based on the number of rows you are inserting.Behavior Extent
This kind of behavior is expected and is also present to both
MergeAll
andUpdateAll
.Conclusion
The number of cached statements will vary on the number of
batchSize
you passed on the batch operations (i.e.:InsertAll
,MergeAll
andUpdateAll
). The size of the statement that is being cached will vary on the size of the entity schema (i.e.: Number of Columns).Optimizations
Currently, RepoDB is creating multiple INSERT statement per row-numbers’s batch insertion. See below.
The statement above are verbose and is also not using the more optimal bulk insert. This can be optimized by below.
With that approach, it will eliminate so may characters from the memory.
The
BinaryBulkInsert
only requires 40 MB as it does not cache anything.Project: InsertAllMemoryLeaksPostgreSql-BinaryBulkInsert.zip