Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

query(with OFFSET&LIMIT) in cosmos DB-node.js SDK is too slow than REST-API

See original GitHub issue

Package Name: “@azure/cosmos” (Azure/azure-sdk-for-js)
Package Version: 3.7.2
Operating system: Linux on Azure
[v] **nodejs - version: 10.21
[v] browser
- name/version: EDGE, Chrome
[v] typescript - version: 4.0.2, 3.9.7, 3.8 – related package: “webpack”: “^4.42.0”, “webpack-cli”: “^3.3.12”,“babel-loader”: “8.1.0”
Is the bug related to documentation in
- [v] SDK API docs on https://docs.microsoft.com **** OFFSET LIMIT clause in Azure Cosmos DB The sentence quoted below does not describe that rest-API avoids fetching unnecessary Rows.

You should use OFFSET LIMIT for cases when you would like to skip items entirely and save client resources. For example, you should use OFFSET LIMIT if you want to skip to the 1000th query result and have no need to view results 1 through 999. On the backend, OFFSET LIMIT still loads each item, including those that are skipped. The performance advantage is a savings in client resources by avoiding processing items that are not needed.

Describe the bug query(with OFFSET&LIMIT) in cosmos DB-node.js SDK is too slow than REST-API.

REST-API was able to fetch only the rows specified in the LIMIT and OFFSET parameters.

For example, when I ran a Query with REST-API that fetched only the last 5 rows of about 20,000 rows (19530), the elapsed time was 1228.9 milliseconds. REST-API avoids processing unnecessary rows.

{
query: "SELECT * FROM ActiveActionCorrespondences AS c OFFSET  @offset LIMIT @limit ",
parameters: [{name: "@offset", value: 19525}, {name: "@limit", value: 5}]
}

On the other hand, if the same querySpec as above is specified in container.items.query (querySpec) .fetchAll () of cosmos DB-node.js SDK, OFFSET is 0 and about 20,000 rows are divided into 100 rows and fetched. REST-API was requested about 200 times. The SDK(container.items.query (querySpec) .fetchAl) does not avoid handling unnecessary rows.

{
"query":"SELECT *  FROM ActiveActionCorrespondences AS c OFFSET 0 LIMIT 19530",
"parameters":[{"name":"@offset","value":19525},{"name":"@limit","value":5}]
}

container.items.query (querySpec) .fetchAll () was about 200 times slower than a direct REST-API call.

To Reproduce Steps to reproduce the behavior:

Prepare a record of about 20,000 lines.
Specify sqlSpec in container.items.query (querySpec, options) and execute it so that only the last 5 rows of about 20,000 rows are fetched.
In the browser developer panel, observe what is being sent and received on the network. The part marked in the attached figure is the unexpected behavior.

Expected behavior It is expected that the REST API will utilize the ability to interpret and execute non-zero OFFSET in the cosmos DB-node.js SDK and fetch the number of rows specified in the limit parameter.

Screenshots

*** Expected behavior= REST-API Direct CALL. cosmos-DB-offset-ok1

** Bad Request Payload via container.items.query (querySpec) .fetchAll () ; The @offset parameter was replaced with the constant 0. Although only 5 lines were specified in the @limit parameter, it was replaced with a constant 200 times larger. cosmos-DB-offset-ng2-requset-payload

** Bad responce to container.items.query (querySpec) .fetchAll () ; “_count: 100” indicates that it is fetching 20 times more extra. cosmos-DB-offset-ng2-responce-resource

** About 200 times more calls by One time of calling container.items.query (querySpec) .fetchAll () ; cosmos-DB-offset-ng2-requset-many-times

Issue Analytics

State:
Created 3 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

master-maintenance1-peer-connectcommented, Sep 23, 2020

@zfoster Thank for replay.

I See.

This issue seems related to the portal, and the behavior in the portal is intentional, but could be revisited.

I understood that this issue was not a bad implementation at SDK, but a lack of specification as a distributed database.

the portal sends the request to the SDK with a query plan because the portal does not assume whether you have a single partition, multiple partitions,

If there is a feature in Cosmos DB preferences that allows I to use only a single partition, please let me know.

Therefore, I want to send a feature request. Please indicate the appropriate request destination.

A. SDK Function-1: Like REST API, it has an affinity function to Fetch from a specific single partition.
B. Portal Function: Ability to enumerate single partition identifiers that can be specified as the affinity of an opened Cosmos-database/continar.
C: SDK Function 2: A function that automatically RE-RUNs fetch with an alternative single partition when a temporary failure occurs in the single partition prioritized by the Function-1.

(Recovery measures from temporary failures that may be improved by retries) (as you know, For example, if you start reading the last 100 rows of 20,000 rows and the transfer of the 10th row fails, reading only the remaining 90 rows from the alternate partition will result in inconsistent data. (For example, Row 19899-19990 is read) The distributed database client-side library must be reloaded from the first row when the SDK library starts RE-RUN. )

I hope that these features are present in distributed databases, such as Oracle RAC parallel databases, and are understood to be essential to Cosmos DB competition.

Best regards.

0reactions

master-maintenance1-peer-connectcommented, Sep 23, 2020

@zfoster Thank for replay.

I’ve communicated this to the portal team.

I may need to change my feature-request.

Similar to the COSMOS DB REST API, I want azure-sdk-for-js to provide a means to reduce the amount of transfer from the AZURE cloud environment to the client WAN side. REST-API loads almost all rows from all partitions in a high-speed communication environment inside Azure Cosmos DB on the other side of the cloud, finds the partition containing the rows specified by OFFSET, and Only a small number(specified by LIMIT) of Rows are sent to the Client side of the cloud via a slow WAN. If our Wan is as slow as a few mega BPS, it can improve performance.

This feature request could also be achieved with Client-side-JavaScript code shown below, except for error recovery measures.

           // Equivalent to container.items.query(querySpec, feedOptions).fetchAll()
           async getRecords(querySpec, feedOptions) {
                return this.acquireToken(
                    COSMOSDB__REST_SCOPES.DOCUMENTS,
                    useRedirectFlow
                ).then(tokenResponse => {
                    return fetch(COSMOSDB_REST_ENDPOINTS.DOCUMENTS,
                        {
                            method: 'POST',
                            headers: {
                                Authorization: `Bearer ${tokenResponse.accessToken}`
                                ** and other Headers of Cosmos DB **
                            },
                            body: {
                                qurry: querySpec.qurry,
                                parameter: querySpec.parameter
                                  /* [
                                    {name: '@offset', value: parameter1.offset /*'19990'*/},
                                    {name: '@limit', value: parameter1.limit /* 5 */}
                                  ]*/
                            }
                        });
                }).catch(svErr => {
                    throw Error(svErr);
                }).then((res) => {
                    if (res.ok) {
                        return res;
                    }
                    switch (res.status) {
                        case 400: throw Error('INVALID_TOKEN');
                        case 401: throw Error('UNAUTHORIZED');
                        case 500: throw Error('INTERNAL_SERVER_ERROR');
                        case 502: throw Error('BAD_GATEWAY');
                        case 404: throw Error('NOT_FOUND');
                        default: throw Error('UNHANDLED_ERROR');
                    }
                }).then(response1 => response1.json())
            }

I apologize for the confusingness of my poor English text. I am grateful that @zfoster understands the usefulness of my feature request.

However, your proposal is not feasible for me.

A) You can currently specify the partitionKey in the options (FeedOptions in your screenshot)

In order to implement your suggestion, the JavaScript code in the client browser will need to keep a copy of the Cosmos DB index or cache. For example, when fetching ROW of {offset:8, limit: 2} from the following record,

         {'partitionKey-A': data0,'partitionKey-A': data1,
              'partitionKey-B': data2,'partitionKey-B': data3,
              'partitionKey-C': data4,'partitionKey-C': data5,
              'partitionKey-C': Dirty-data6,'partitionKey-C': data7,
              'partitionKey-D': data8,'partitionKey-D': data9,
              'partitionKey-D': data10,'partitionKey-D': data11,
         }

If user1 has just deleted{partitionKey-D’: Dirty-data6} fetchAll () of “select * from c OFFSET 8 LIMIT 2” will return the following records. {‘partitionKey-D’: data8,‘partitionKey-D’: data9}

But,There is user2, another user in the distributed environment, Before the deletion by user1 remains dirty and propagates to user2, user2’s fetchAll () will return the following records: {‘partitionKey-C’: data7,‘partitionKey-D’: data8,'}

Before finding the partition-C/D to be fetched here, even if there is a failure in the partition of partitionKey-A or partitionKey-B, partitionKey-C can Fetch because only the index is Referenced. Probably, An index or cache of 12 entries {partition-A, A, B, B, C , C, [C], C, D, D, D, D} is needed to find the partition key and Row’s number.

Maintenance of such an index(or cache) would be possible inside Cosmos DB, but the JavaScript code in the client browser is too low in memory to achieve it.

Please reconsider how to implement my feature request. (For example, implementation that avoids complication of the problem like the above JavaScript code)

Best Reguard.