Enhancement: Refactor Batch Operations SQL Statements
See original GitHub issueDescribe the enhancement
As a response to the #1142, it is important for the library to limit the usage of the memory while it is caching the packed-statements per entity/model when calling the batch operations. This is to ensure give more resource for the application to utilize the memory on other purpose.
The rationale
What? In the mentioned issue above, in the PGSQL extended library, a user has reported a memory leaks. They have an entity model that corresponds to the table with 112 columns. During the call to the InsertAll
operation with batchSize
of 500, the library suddenly spike the memory to 3GB.
Why? Given with this parameter, RepoDB will/might possibly cache the max of 500 big packed-statement per row-size for such big table. This is only for a single entity and on this operation, not yet for other enties for other batch operation (i.e.: MergeAll, UpdateAll)
In the current version of the library, when you call the InsertAll
operation, it caches a packed-statement based on the number of rows passed in the operation. (This is also true to MergeAll
and UpdateAll
)
For a better elaboration, if a user had passed 3 entities in the operation, the statement below will be created.
INSERT INTO [Table] (Col1, ... Col112) VALUES (@Col1_1, ..., @Col112_1); RETURN SCOPE_IDENTITY();
...
INSERT INTO [Table] (Col1, ... Col112) VALUES (@Col1_3, ..., @Col112_3); RETURN SCOPE_IDENTITY();
Such statement will be cached into a host memory for the operation with 3 rows. It will be reused when the same entity model (or table) is passed again in the future, but is only for 3 rows.
Then, if a user had passed 5 entities in the operation, the statement below will be cached as well into the host memory.
INSERT INTO [Table] (Col1, ... Col112) VALUES (@Col1_1, ..., @Col112_1); RETURN SCOPE_IDENTITY();
...
...
...
INSERT INTO [Table] (Col1, ... Col112) VALUES (@Col1_5, ..., @Col112_5); RETURN SCOPE_IDENTITY();
If the user had set the batchSize
to 500, a possible 500 packed-statements of the above statements will be saved into the host memory, resulting to a bigger memory requirements/consumptions.
The bigger the rows passed, the bigger the statement it will cached.
Conclusion
As per the report and also to the screenshots we provided during the investigation, it has utilized to the max of 3GB memory for such single entity. This alarming as it will require the application to issue high resource limit in case they need to utilize the batch capability of the library.
Though, this is not an issue, but this is something requires an attention and revisits to optimize the memory usage.
Issue Analytics
- State:
- Created 6 months ago
- Comments:8 (6 by maintainers)
Top GitHub Comments
@cajuncoding yeah, I read the link you shared and it is such a goos comment feom the community to not rely on the default order of the output based on the input. So, it is still recommendable to add that extra steps, which I would also do. Thanks mate 🚀
There seems to have a challenge in the PGSQL, a working query in SQL Server is failing in PGSQL.
Reference: https://twitter.com/mike_pendon/status/1638654274237497346