V8: LINQ Published Content query memory use high / initial query slow.
See original GitHub issueMight not be a issue for most, however, when you have a large number of content nodes and try to run a linq query such as get the top 10 published pages that start with some path ordered by a date property descending, the first time the query is very slow and memory use increases significantly.
For one scenario with a database of ~26000 pages, it takes ~ 2000 milliseconds to get the top 10 most recently published pages for the first linq query, and 94 milliseconds for subsequent queries. Using SQL to run an equivalent query to get the ids then just look up the PublishedContent by id’s runs locally in ~330ms and saves about 100mb of memory for this site/query.
I have another site with around 80,000 pages I’m trying to work my way up to being viable in v8, the goal being to try reduce memory used by older content while still having the content available.
It would be ideal to only load the properties used in the where/select clause until / ToList/ToArray etc are called so that less memory is used. Having a more memory efficient way similar to v7, where you could use an XpathNavigator instead of IPublishedContent / examine would help, something like another content nucache database that only contains the properties you have marked as usable in queries you could use to find id’s then look up the iPublishedContent from the main nucache. Examine might be ideal, but it’s a bit hit and miss on Azure and Azure Search is an additional cost that could be avoided given cpu usage isn’t high for this type of query. This could also avoid the need to increase virtual machine/app service plan costs due to needing additional memory to store less frequently used content when you have low cpu usage.
Umbraco version
I am seeing this issue on Umbraco version: 8.6.2
Reproduction
Bug summary
Entire set of property data is loaded from nucache for each content item into memory when you are not filtering on these properties and the properties may not be required as the documents won’t be in the result set.
Specifics
Umbraco V8.6.2 Chrome latest
Steps to reproduce
- Set up a database with a large number of pages (10,000+) with long html bodies in a property (not used by where clause in query).
- Run a linq query such as (Top 10 most recent articles in a section of the site ordered by a custom publishDate property, falling back to CreateDate where a custom publish date property is unavailable)
var articles = CurrentPage.Descendants().Where(x => x.ContentType.Alias == "article") .OrderByDescending(x => x.Value<DateTime>("publishDate",defaultValue: x.CreateDate)) .Skip(0) .Take(10).ToList()
- The first time the query is run it can take seconds to complete and appears to load the entire set of descendants into memory.
- Rerun the linq query.
Expected result
Query runs fast every time, milliseconds vs seconds and does not load into memory all the properties of documents that were not filtered on or on documents that are not part of the result.
Actual result
All properties loaded into memory and query performs slowly for the first time.
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (5 by maintainers)
Thanks for the report and replies. As I read this at the moment, we have wishes to support
IQueryable
in the future. Since this is currently not something we’re actively working on, I’ve put it on the “idea” list (our wish list for things we want to work on in the future) for us to pick up when we have a chance to do so.Hi there @nzdev,
Just wanted to let you know that we noticed that this issue got a bit stale and we haven’t been able to get to this idea. We will close this idea for now, as we haven’t been able to prioritize it yet.
Once we get time to work on ideas that are in this category we’ll review and update existing issues like this one to let you know we’re working on it.
Thanks, from your friendly Umbraco GitHub bot 🤖 🙂