Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Bug: <Memory Leak persist after update 1.13.1 >

See original GitHub issue

Bug Description

Hi Michael, I’m using RepoDb.PostgreSql.BulkOperations version 1.13.1, this application parse a lot of CDR’s files and insert in a PostgreSql database, but when my application execute this line:

await db.InsertAllAsync( tableName: "schema.table", recordsFinal, batchSize: insertBuffer, transaction: transaction );

Memory usage increase and never decrease, No matter if I force to GC to collect and clean. This application runs in a Debian Linux Server with 32 GB and postgresql 13.x, I’m using the latest version of RepoDb.PostgreSql.BulkOperations and dotnet core 6.0. When the applications usage gets the 32 GB of RAM, the server starts to swap and the application stops working.

And it seems that it is incremental, for example it starts with 100 MB and for each file it processes it increases exponentially 200MB, 400 MB… 3 GB, etc.

Images

Library Version: Version: RepoDb.PostgreSql.BulkOperations version 1.13.1

Issue Analytics

State:
Created 6 months ago
Comments:7 (7 by maintainers)

Top GitHub Comments

1reaction

mikependoncommented, Mar 17, 2023

I have investigated this and it sounds to me that it works as expected. But, you as a user must know the caveat of it. In short, this is not a memory leaks!

RepoDB requires these caching for it to perform a more performant insertion of the batch operations.

Explanation

Below is the screenshot and a very small project that we used for simulation and replication. The project below requires a SQL Server as we did the simulation there. Unfortunately for SQL Server and MDS, it only allow a maximum number of 2100 parameters. Therefore, you will see that I can only hit the max of 20 batchSize for 20 rows with 50 columns data entities.

Project:

This project is good for small simulation on this kind of issue. InsertAllMemoryLeaks-InsertAll.zip

Screenshot:

What the program does?

It first create a table named [dbo].[BigTable] with 120 columns.
The program will iterate infinitely to do an InsertAll operation.
In every iteration, the program will create a list of BigTable entity with max 50 columns. The creation of the list will vary on the maximum batchSize, from 1. This is to ensure that RepoDB will create and cache each buffer’s command text in the memory.
Insert the created list of BigTable entity towards the table.

Observation

Fluctuations:

In the first few seconds, the memory has fluctuated a lot, it is because when the INSERT statement is being created for the number of rows given, the library will put that in the cache.

Truth: If you insert 1 row, it will create 1 INSERT statement and cache it. If you insert 2 rows, it will create 2 INSERT statements and cache it, and so forth.

The batchSize is the maximum number of INSERT statement it will create and cache into the memory. So in short, you will have 20 times INSERT statement being cached into the memory (in which each of them will have different number of parameters based on the columns provided on the data entities).

Flat-Lines:

You will notice the flat-lines after those 20 INSERT statement has been cached to the memory. This is because the library is not creating an INSERT statement anymore, instead, simply reusing the one in the cache based on the number of rows you are inserting.

Behavior Extent

This kind of behavior is expected and is also present to both MergeAll and UpdateAll.

Conclusion

The number of cached statements will vary on the number of batchSize you passed on the batch operations (i.e.: InsertAll, MergeAll and UpdateAll). The size of the statement that is being cached will vary on the size of the entity schema (i.e.: Number of Columns).

Optimizations

Currently, RepoDB is creating multiple INSERT statement per row-numbers’s batch insertion. See below.

InsertAll with 1 row: Statement cached:

INSERT INTO [Table] VALUES (ColumnFirst, ..., ColumnLast);

InsertAll with 5 rows: Statements cached:

INSERT INTO [Table] VALUES (ColumnFirst, ..., ColumnLast);
INSERT INTO [Table] VALUES (ColumnFirst2, ..., ColumnLast2);
INSERT INTO [Table] VALUES (ColumnFirst3, ..., ColumnLast3);
INSERT INTO [Table] VALUES (ColumnFirst4, ..., ColumnLast4);
INSERT INTO [Table] VALUES (ColumnFirst5, ..., ColumnLast5);

And so forth…

The statement above are verbose and is also not using the more optimal bulk insert. This can be optimized by below.

InsertAll with 3 rows: Statements cached:

INSERT INTO [Table]
VALUES
(ColumnFirst, ..., ColumnLast),
(ColumnFirst2, ..., ColumnLast2),
(ColumnFirst3, ..., ColumnLast3);

With that approach, it will eliminate so may characters from the memory.

0reactions

mikependoncommented, Mar 17, 2023

The BinaryBulkInsert only requires 40 MB as it does not cache anything.

Project: InsertAllMemoryLeaksPostgreSql-BinaryBulkInsert.zip

Top Results From Across the Web

Massive RAM leak when switching models · Issue #6532 · ...

I've seen no major issues with VRAM usage, this is purely a RAM leak (RSS to be precise). Leak is pretty much in...

Possible Spigot Bug, causing memory leaks.

Since my last post, server lag has not gone down. ... I mean I knew it must've been chunks but we only ever...

Minecraft 1.13.2 stopped memory leaking when I applied a ...

It worked after I introduced a small mod to fix MC-188163. My fix works by applying a cache eviction policy to the DataFixerUpper...

1268807 – CVE-2015-5292 sssd: memory leak in the ...

sssd-1.13.1-2.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report....

README file

Fix an iteration-related memory leak in the DB2 KDC database back end. * Fix issues with some less-used kadm5.acl functionality.

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Bug: <Memory Leak persist after update 1.13.1 >

Bug Description

Issue Analytics

Top GitHub Comments

Explanation

What the program does?

Observation

Behavior Extent

Conclusion

Optimizations

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Enhancement: Refactor Batch Operations SQL Statements

InsertAsync on Postgres returns the wrong identifier