question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Enhance split query buffering

See original GitHub issue

File a bug

When enabled, Split Querying is supposed to buffer all result sets except for the last. This does not seem to happen when only one result set is requested. When only one result set is requested, the entire query is still buffered

Include your code

Here is sample code that demonstrates the issue.

using Microsoft.EntityFrameworkCore;
using System.Text;

namespace SplitQueryBug {
    public static class Program {
        public static void Main() {
            var CS = "Data Source=(local);Initial Catalog=__SPLIT_QUERY_BUG;Integrated Security=SSPI;";
            //Ensure our DB exists
            {
                using var Context = MyContext.FromSqlServerConnectionString(CS);
                Context.Database.EnsureCreated();
            }

            //Create our seed text which is 1MB in size.
            var SeedText = "";
            {
                var SB = new StringBuilder();
                for (int i = 0; i < 1024 * 1024 * 1; i++) {
                    SB.Append("X");
                }
                SeedText = SB.ToString();
            }

            //Store 32*32 = 1024 records (this would be 1GB of data)
            for (int i = 0; i < 32; i++) {
                Console.WriteLine($@"Seeing batch {i}...");
                using var Context = MyContext.FromSqlServerConnectionString(CS);

                for (int j = 0; j < 32; j++) {
                    var Data = new ContextData() {
                        Text = SeedText,
                    };
                    Context.Add(Data);
                }

                Context.SaveChanges();
                
            }
            
            //Run our query.
            //You'll notice that memory usage spikes when this happens and there is a giant
            //delay while all rows are fetched
            {
                using var Context = MyContext.FromSqlServerConnectionString(CS);
                var Query = Context.Data
                    .AsNoTracking()
                    .AsEnumerable()
                    ;

                foreach (var item in Query) {
                    
                }
                

            }




        }
    }

    public class MyContext : DbContext {
        public DbSet<ContextData> Data => Set<ContextData>();

        public MyContext(DbContextOptions<MyContext> Options) : base(Options) {

        }


        public static MyContext FromSqlServerConnectionString(string ConnectionString) {
            var Options = new DbContextOptionsBuilder<MyContext>()
                .UseSqlServer(ConnectionString, x => x.UseQuerySplittingBehavior(QuerySplittingBehavior.SplitQuery))
                .Options
                ;

            var ret = new MyContext(Options);

            return ret;
        }

    }

    public class ContextData {
        public long Id { get; set; }
        public string Text { get; set; } = string.Empty;
    }

}

I have observed that If QuerySplitting is not active, whether it is disabled globally or disabled on the single query, the results correctly return in an unbuffered manner, however, if it is enabled at all, the entire query seems to be buffered.

Include stack traces

This is confirmed through two methods:

  1. Memory usage explodes until the table is fully loaded: image

  2. If I pause the app when trying to get the first result from the enumerable, I get a stack trace like this which seems to indicate buffering: image

Include provider and version information

EF Core version: 6.x Database provider: Microsoft.EntityFrameworkCore.SqlServer Target framework: .NET 6.0 RC1 Operating system: Win10 IDE: VS 2022

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:9 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
rojicommented, Sep 22, 2021

Aside from that, https://github.com/dotnet/efcore/issues/26129#issuecomment-924867825 seems to repeat the information already given.

  • For EF buffering when no related entities are being loaded, we agree this is a bug (and will fix it at some point) but consider it low-priority; place an AsSingleQuery on the relevant queries to work around the buffering.
  • If you’re seeing unneeded/spurious buffers when related entities are being loaded, please open a new issue with a full code sample - that would likely be higher priority.
0reactions
smitpatelcommented, May 10, 2022

Our IsBuffering flag is not sufficient. For cases like retrying execution strategy we need to buffer results no matter what. But in case of split query we can avoid buffering last reader. So 2 possible solutions here are adding another flag on QCC to differentiate the reason for buffering or we add true buffered data reader.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Single vs. Split Queries - EF Core
So all results from earlier queries must be buffered in your application's memory before executing later queries, which leads to increased ...
Read more >
Why would splitting up a query make it faster, and can I/ ...
It's possible that ANALYZE TABLE maps; might build some better index stats and improve the query performance. Conventional wisdom is that ...
Read more >
SQL IN Query performance - better split it or not
What would be a better for performance/workload, one query for all the 1000 id's or split them into like 20 queries so 50...
Read more >
Improve query performance with Table.Buffer? - Data Setup
Finally my question: I think that I should be able to improve performance by buffering the source table rather than fetching it each...
Read more >
SQL Query Optimization: 12 Useful Performance Tuning ...
Buffer cache: Used to reduce memory usage on the server. ... If we split this query into two SELECT queries and combine them...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found