question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AOT query mode: precompiled queries (1st part of the query pipeline)

See original GitHub issue

Remaining sub-tasks

  • ExecuteUpdate/Delete
  • SQL queries
  • Split query
  • GroupBy final operator (materializer work)
  • Shared-type entity types (in various places where we do entity lookups)

Bugs and edge cases


General design

When a LINQ query is first encountered, EF “compiles” it, producing a code-generated shaper, SQL (for relational databases), etc. This process is both a bit long (increasing startup times), and incompatible with AOT environments (since code generation is used at runtime). While several approaches have been discussed in the past to improve this (e.g. #16496), with the advent of source generators we have some new possibilities. I’ve done some work on a proof-of-concept source generator which identifies EF queries and precompiles them; the work is far from complete but indicates that the approach is feasible.

In a nutshell, we would:

  1. Identify a query in user source code
    • A first implementation would identify invocations of EF’s compiled query API (EF.CompileQuery); this is trivial and low-risk way to immediately identify EF queries in the user’s code.
    • We could later also attempt to precompile regular queries which don’t use EF.CompileQuery. This would be an additional step in which we identify DbSets (as member accesses on a DbContext-typed identfier), and then walk up the syntax tree, progressively including methods as long as they accept IQueryable. Once we reach a method which doesn’t accept IQueryable (e.g. ToList), we’ve reached the end of the query to be compiled.
    • Dynamically-constructed queries wouldn’t be supported.
  2. Transform the query to a LINQ expression tree
    • Once we have a Roslyn syntax tree representing a query (either from EF.CompileQuery or from a regular query), it needs to be transformed into a LINQ expression tree, which is what EF’s query pipeline requires.
    • Unlike the Roslyn structures, LINQ expression trees refer to actual .NET types, MemberInfos, etc. We would therefore need to load the user’s assembly (from the input compilation given to the source generator), and use reflection to load actual types from it (e.g. entity CLR types). See note on AssemblyLoadContext below.
  3. Compile the query with EF Core
    • Once we have a LINQ expression tree, we need to pass it to EF’s query compiler. To do this:
      • We instantiate the user’s DbContext type, using the parameterless constructor
      • Extract the IQueryCompiler service from it
      • Invoke the compiler, passing it the LINQ expression tree.
    • The output of this compilation is another LINQ expression tree, which instantiates e.g. a SingleQueryingEnumerable given a QueryContext. This output tree must not contain any compiled elements, e.g. the shaper must be present in non-compiled form. This would require some refactoring of the last parts of the query pipeline.
  4. Generate C# out of the compilation output
    • In the normal flow, the output LINQ expression tree is now compiled to produce a lambda (returning e.g. an enumerable given a QueryContext).
    • In the AOT flow, the expression tree would instead be outputted as C# code into a file emitted by the source generator. This generated code would be invoked by EF as part of startup, and would pre-populate its query cache.
    • This would require writing a component to convert a LINQ expression tree to C# code - possibly passing through a Roslyn syntax tree for maximum flexibility etc…

The final code added by the source generator would look something like the following:

var selectExpression = ...;

var readColumns = ...;

var relationalCommandCache = new RelationalCommandCache(
    memoryCache,
    querySqlGeneratorFactory,
    RelationalParameterBasedSqlProcessFactory,
    selectExpression,
    readColumns,
    useRelationalNulls: false
);

var shaper = ...;

var enumerable = new SingleQueryingEnumerable<Blog>(
    (RelationalQueryContext)QueryCompilationContext.QueryContextParameter,
    relationalCommandCache,
    shaper,
    typeof(Blog),
    standAloneStateManager: false,
    detailedErrorsEnabled: false,
    threadSafetyChecksEnabled: true);

// Pre-populate EF Core's cache with the above enumerable

Additional notes:

  • The above does not cover relational command caching (including SQL), which depends on parameter nullability. This means that some query compilation still remains at runtime (but no code generation).
  • We may be able to reuse previously-precompiled queries if their source file hasn’t change (e.g. store file hashes). This would make this feature suitable also for speeding up the developer inner loop.
  • Query precompilation isn’t necessarily dependent on using compiled models (#1906), though using that would speed the process up.
  • This could be helpful (thanks @bricelam)

EDIT: Following internal discussion it has become clear that doing this as a source generator isn’t practical (see https://github.com/dotnet/efcore/issues/25009#issuecomment-853735001 below). Instead, this would be a design-time CLI command or similar.

  • ~This would most likely be opt-in-only (via a csproj property), and probably makes most sense in Release builds.~
  • ~When loading user assemblies (and their dependents), we probably want to isolate them in their own AssemblyLoadContext. This isn’t trivial - we need to take Roslyn-provided syntax tree and semantic models (default assembly loader), transform them into an expression tree, and pass that into the query pipeline isolated inside the special AssemblyLoadContext. In my prototype, the default AssemblyLoadContext is used to avoid these issues.~

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:4
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
rojicommented, Jun 3, 2021

@AndriySvyryd yeah. To sum up an internal conversation with @jaredpar:

  • Running user code from a source generator (i.e. in the compiler process) could cause severe build perf issues if the user code does something bad (e.g. hang VS). This could be considered a bit less risky if the feature is opt-in, but the potential for trouble is still very big.
  • Loading a user assembly from a source generator would probably not work, since the compiler process is frequently still on .NET Framework (i.e. when running in VS), but the user assembly is .NET Core.
  • Ordering issues could make the EF source generator run before another source generator; if that other source generator is necessary in order to produce a working assembly (e.g. produce some required partial method), then the EF source generator would see a non-compiling Compilation, and cannot run any user code in it.

So yeah, we’ll probably go with a design-time tool (e.g. CLI command). The general plan outlined above should still apply to that (and the need for isolating the user assemblies is no longer relevant).

1reaction
AndriySvyrydcommented, Jun 3, 2021

Alternatively instead of implementing this as a source generator it could be a design-time tool that uses Roslyn and avoids the issues related to loading user assemblies and resolving types used in queries.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to: Create Queries by using the AOT | Microsoft Learn
In the AOT, right-click Queries, and then click New Query. Right-click Data Dictionary, and then click Open New Window. Drag a table, map, ......
Read more >
Adaptive Query Compilation with Processing-in-Memory
This code is compiled ahead of time (AOT), which means that only static query parts can be executed with it. Consequently, operators must...
Read more >
Flare: Optimizing Apache Spark with Native Compilation for ...
Spark SQL optimizes query plans using its re- lational query optimizer, called Catalyst, and may even generate Java code at runtime to ...
Read more >
How to use a precompiled Linq query
I want to instantiate the query in the constructor (again, first time using these so not sure if this is the proper use)...
Read more >
Manual: Scripting restrictions
Unity provides a common scripting API and experience across all platforms it supports. However, some platforms have inherent restrictions.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found