Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Enhancement: Refactor Batch Operations SQL Statements

See original GitHub issue

Describe the enhancement

As a response to the #1142, it is important for the library to limit the usage of the memory while it is caching the packed-statements per entity/model when calling the batch operations. This is to ensure give more resource for the application to utilize the memory on other purpose.

The rationale

What? In the mentioned issue above, in the PGSQL extended library, a user has reported a memory leaks. They have an entity model that corresponds to the table with 112 columns. During the call to the InsertAll operation with batchSize of 500, the library suddenly spike the memory to 3GB.

Why? Given with this parameter, RepoDB will/might possibly cache the max of 500 big packed-statement per row-size for such big table. This is only for a single entity and on this operation, not yet for other enties for other batch operation (i.e.: MergeAll, UpdateAll)

In the current version of the library, when you call the InsertAll operation, it caches a packed-statement based on the number of rows passed in the operation. (This is also true to MergeAll and UpdateAll)

For a better elaboration, if a user had passed 3 entities in the operation, the statement below will be created.

INSERT INTO [Table] (Col1, ... Col112) VALUES (@Col1_1, ..., @Col112_1); RETURN SCOPE_IDENTITY();
...
INSERT INTO [Table] (Col1, ... Col112) VALUES (@Col1_3, ..., @Col112_3); RETURN SCOPE_IDENTITY();

Such statement will be cached into a host memory for the operation with 3 rows. It will be reused when the same entity model (or table) is passed again in the future, but is only for 3 rows.

Then, if a user had passed 5 entities in the operation, the statement below will be cached as well into the host memory.

INSERT INTO [Table] (Col1, ... Col112) VALUES (@Col1_1, ..., @Col112_1); RETURN SCOPE_IDENTITY();
...
...
...
INSERT INTO [Table] (Col1, ... Col112) VALUES (@Col1_5, ..., @Col112_5); RETURN SCOPE_IDENTITY();

If the user had set the batchSize to 500, a possible 500 packed-statements of the above statements will be saved into the host memory, resulting to a bigger memory requirements/consumptions.

The bigger the rows passed, the bigger the statement it will cached.

Conclusion

As per the report and also to the screenshots we provided during the investigation, it has utilized to the max of 3GB memory for such single entity. This alarming as it will require the application to issue high resource limit in case they need to utilize the batch capability of the library.

Though, this is not an issue, but this is something requires an attention and revisits to optimize the memory usage.

Issue Analytics

State:
Created 6 months ago
Comments:8 (6 by maintainers)

Top GitHub Comments

1reaction

mikependoncommented, Mar 19, 2023

@cajuncoding yeah, I read the link you shared and it is such a goos comment feom the community to not rely on the default order of the output based on the input. So, it is still recommendable to add that extra steps, which I would also do. Thanks mate 🚀

0reactions

mikependoncommented, Mar 22, 2023

There seems to have a challenge in the PGSQL, a working query in SQL Server is failing in PGSQL.

Reference: https://twitter.com/mike_pendon/status/1638654274237497346

Top Results From Across the Web

Refactoring "extreme" SQL queries

I can get average times for processes and see trends of improvement or spot potential problems. This also lets me identify the low-hanging...

Database Command Batching in .NET 6 - InfoQ

Provide a structured way to execute multiple SQL statements in a single ... Most ORMs need to batch operations for the sake of...

SQL formatter | ApexSQL

ApexSQL Refactor is a SQL Server Management Studio (SSMS) and Visual Studio formatting and refactoring add-in for SQL Server with nearly 200 formatting...

Refactoring legacy SQL to dbt | dbt Developer Hub - dbt Docs

This guide walks through refactoring a long SQL query (perhaps from a stored procedure) into modular dbt data models.

SQL API - Batch queries

A Batch Query enables you to request queries with long-running CPU processing times. Typically, these kind of requests raise timeout errors when using...