ReadNextAsync with ORDER BY query loads all items at once
See original GitHub issueDescribe the bug
FeedIterator.ReadNextAsync()
reads all data for queries with ORDER BY.
NOTE: All queries in our application are single partition queries (Partition key is specified).
To Reproduce Here is small repro. https://gist.github.com/KristianJakubik/8d90c3b97e385c461df012acbc14a2f8
Expected behavior
I expected, that ReadNextAsync()
would return small portion of data. The same portion as it does when I call query without ORDER BY.
For query which should return in total 1000 items, first call to ReadNextAsync()
returns only 100 items.
Actual behavior
For query which should return in total 1000 items, first call to ReadNextAsync()
returns all 1000 items.
Environment summary SDK Version: 3.6.0, 3.7.0 preview 2 OS Version: Windows
Additional context This is pretty serious bug for us. It directly impact our customers. Order by queries are not bearable for our Cosmos Db anymore, they drain so much RUs and take so much time to execute, when one ReadNextAsync load whole query at once and does not divide it in smaller chunks (continuation token is useless) as it used to be in v2 sdk. We have designed our application, that our client works with continuation token from cosmos sdk and therefore client decides when it wants to load additional data (lazy approach). This way the load on our database was easily maintainable, with new v3 sdk this is not possible.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:11 (10 by maintainers)
Top GitHub Comments
So if I set MaxItemCount to that 100, is it guaranteed to fetch all the 100 documents in a single request? That doesn’t seem feasible. And if it actually really weren’t, is there a way to stop the query after the first request? How low MaxItemCount would I have to set, other than dropping the order by clause?
Yes, I fully agree. Neither client nor service nor SDK can decide what is the optimal limit. Although in my opinion It should be decided by cosmos db instance, what is its upper limit, how much data it can deliver regarding its RUs configuration.
I’ve updated the gist with set MaxItemCount property and I’ve used Fiddler as http proxy.
I did three measurements against cosmos instance scaled up to 400 RUs, 2 000 RUs, 10 000 RUs. Surprisingly the result for all three of them was the same and it seems, that the response from each http request is maximum of 4.2MB. My previous assumption was that this size would vary based on container’s scale.
Expected behavior: I would expect, that ORDER BY queries would return from
ReadNextAsync()
the same number of items as does for quieries without ORDER BY clause. And one call toReadNextAsync()
would represent one call to cosmos instance, the same way it does in non ORDER BY queries.@j82w I hope we are now more on the same page 😃. If you have any more questions, want more detailed explanation, more detailed data from measurements or anything, please feel free to ask.