question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Analytic Functions, Sum and GroupBy

See original GitHub issue

Task

I have a table with events, each one occured at some timepoint. I need to:

  1. Group events per User and Process
  2. Calculate intervals between adjacent events for each group
  3. Filter out all intervals greater than 5 minutes
  4. Sum remained events so each User * Process group contains a scalar value
  5. Plot onto the chart where Y axis for Users, X for Processes, bubbles radius for total Sum (overall duration excluding Idle intervals > 5 mins )

image

Code

The only working approach is below. Commented parts are described next. Sorry for ugly names in DB, not my work, suffer from it too.

using LinqToDB;
using LinqToDB.DataProvider.SqlServer;
using System;
using System.Linq;

namespace Tests
{
    public class Process_IndexedEvents
    {
        public int eventIndex;
        public int processID;
        public string eventUser;
        public DateTime eventTime;
    }

    class Program
    {
        const string ConnectionString =
            @"Server=DESKTOP\SQLEXPRESS; Database=Automata; Integrated Security=true;";

        static void Main()
        {
            using (var db = SqlServerTools.CreateDataConnection(ConnectionString))
            {
                // LinqToDB.Common.Configuration.Linq.AllowMultipleQuery = true
                // LinqToDB.Common.Configuration.Linq.PreloadGroups = true

                var query =
                    from o in from x in db.GetTable<Process_IndexedEvents>()
                              select new
                              {
                                  User = x.eventUser,
                                  Proc = x.processID,
                                  Diff = x.eventTime - 
                                         Sql.Ext
                                            .Lag(x.eventTime, Sql.Nulls.None)
                                            .Over()
                                            .PartitionBy(x.eventUser, x.processID)
                                            .OrderBy(x.eventTime)
                                            .ToValue()
                              }
                              //into y
                              //where 0 < y.Diff.TotalMinutes && y.Diff.TotalMinutes <= 5
                              //select y
                    group o by new {o.User, o.Proc}
                    into g
                    select g;

                query.Take(10)
                     .ToList()
                     .Select(x => new
                     {
                         x.Key,
                         Count = x.Count(),
                         Sum   = x.Where(y => 0 < y.Diff.TotalMinutes 
                                               && y.Diff.TotalMinutes <= 5)
                                  .Sum(y => y.Diff.TotalMinutes)
                     })
                     .ToList()
                     .ForEach(Console.WriteLine);
            }
        }
    }
}

Sum

Attempt to calculate final Sum from step 4 not locally but on SQL side:

query.Take(10)
  // .ToList()
     .Select(x => new

fails with:

Unhandled Exception: System.ArgumentException: Property 'System.TimeSpan Diff' is not defined for type 'System.Linq.IGrouping`2[<>f__AnonymousType1`2[System.String,System.Int32], <>f__AnonymousType0`3[System.String,System.Int32,System.TimeSpan]]'

   at System.Linq.Expressions.Expression.Property(Expression expression, PropertyInfo property)
   at System.Linq.Expressions.Expression.MakeMemberAccess(Expression expression, MemberInfo member)
   at LinqToDB.Expressions.Extensions.TransformX(MemberExpression e, Func`2 func) in C:\projects\linq2db\Source\LinqToDB\Expressions\Extensions.cs:line 1259
   at LinqToDB.Expressions.Extensions.Transform(Expression expr, Func`2 func) in C:\projects\linq2db\Source\LinqToDB\Expressions\Extensions.cs:line 995
   at LinqToDB.Expressions.Extensions.TransformX(MemberExpression e, Func`2 func) in C:\projects\linq2db\Source\LinqToDB\Expressions\Extensions.cs:line 1259
   at LinqToDB.Expressions.Extensions.Transform(Expression expr, Func`2 func) in C:\projects\linq2db\Source\LinqToDB\Expressions\Extensions.cs:line 995
   at LinqToDB.Linq.Builder.SelectContext.GetExpression(Expression expression, Expression levelExpression, Expression memberExpression) in C:\projects\linq2db\Source\LinqToDB\Linq\Builder\SelectContext.cs:line 1062
   at LinqToDB.Linq.Builder.SelectContext.ProcessScalar[T](Expression expression, Int32 level, Func`4 action, Func`1 defaultAction) in C:\projects\linq2db\Source\LinqToDB\Linq\Builder\SelectContext.cs:line 935
   
   ...

Sum inside query:

group o by new {o.User, o.Proc}
into g
select new
{
	g.Key,
	// Count = g.Count(),
	Sum = g // .Where(y => 0 < y.Diff.TotalMinutes && y.Diff.TotalMinutes <= 5)
		   .Sum(y => y.Diff.TotalMinutes)
};

fails with:

Unhandled Exception: System.Data.SqlClient.SqlException: Column 'Process_IndexedEvents.eventTime' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

Windowed functions cannot be used in the context of another windowed function or aggregate.

   at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at System.Data.SqlClient.SqlDataReader.TryConsumeMetaData()
   at System.Data.SqlClient.SqlDataReader.get_MetaData()
   at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString, Boolean isInternal, Boolean forDescribeParameterEncryption, Boolean shouldCacheForAlwaysEncrypted)
   at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, Boolean inRetry, SqlDataReader ds, Boolean describeParameterEncryptionRequest)
   
   ...

Filter

Attempt to filter intervals from step 3 not locally but on SQL side:

into y
where 0 < y.Diff.TotalMinutes && y.Diff.TotalMinutes <= 5
select y

fails with

Unhandled Exception: System.NullReferenceException: Object reference not set to an instance of an object.

   at LinqToDB.Linq.Builder.SelectContext.<>c__DisplayClass45_0.<ConvertToIndexInternal>b__7(Int32 n, IBuildContext ctx, Expression ex, Int32 l, Expression _) in C:\projects\linq2db\Source\LinqToDB\Linq\Builder\SelectContext.cs:line 576
   at LinqToDB.Linq.Builder.SelectContext.ProcessMemberAccess[T](Expression expression, MemberExpression levelExpression, Int32 level, Func`6 action) in C:\projects\linq2db\Source\LinqToDB\Linq\Builder\SelectContext.cs:line 994
   at LinqToDB.Linq.Builder.SelectContext.ConvertToIndexInternal(Expression expression, Int32 level, ConvertFlags flags) in C:\projects\linq2db\Source\LinqToDB\Linq\Builder\SelectContext.cs:line 572
   at LinqToDB.Linq.Builder.SelectContext.ConvertToIndex(Expression expression, Int32 level, ConvertFlags flags) in C:\projects\linq2db\Source\LinqToDB\Linq\Builder\SelectContext.cs:line 433
   at LinqToDB.Linq.Builder.SubQueryContext.ConvertToSql(Expression expression, Int32 level, ConvertFlags flags) in C:\projects\linq2db\Source\LinqToDB\Linq\Builder\SubQueryContext.cs:line 39

...

SQL Script

This sql query I used as source for C# code works perfectly inside SSMS Replicated in C# even derived table

SELECT
  User_Groups    AS [Dept],
  eventUser      AS [User],
  Process_Name   AS [Proc],
  SUM(i) / 60.0  AS [Sum]
FROM (
  SELECT
    eventUser,
    processID,
    DATEDIFF (
      minute,
      LAG(eventTime) OVER(PARTITION BY eventUser, processID ORDER BY eventTime),
      eventTime) AS i
  FROM Process_IndexedEvents) AS o
INNER JOIN Users ON eventUser = [User]
INNER JOIN Processes AS p ON o.processID = p.ProcessID
WHERE 0 < i AND i <= 5
GROUP BY
  User_Groups,
  eventUser,
  Process_Name

Other

Settings Settings adjustment didn’t help

LinqToDB.Common.Configuration.Linq.AllowMultipleQuery = true
LinqToDB.Common.Configuration.Linq.PreloadGroups = true

Analytic Functions The same problems occur with rest Analytic functions

F# Situation is the same, whether using fluent syntax or F# query { } syntax

Environment details

linq2db version: 2.7.4 Database Server: MS SQL Express Database Provider: MS SQL Server Operating system: Windows 10 .NET Framework: 4.5.2

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:23 (23 by maintainers)

github_iconTop GitHub Comments

2reactions
sdanylivcommented, Jun 15, 2019

There is your query (not tested for business correctnes):

var query =
	from x in db.GetTable<Process_IndexedEvents>()
	select new
	{
		User = x.eventUser,
		Proc = x.processID,
		Diff = Sql.DateDiff(Sql.DateParts.Minute,
			Sql.Ext
				.Lag(x.eventTime, Sql.Nulls.None)
				.Over()
				.PartitionBy(x.eventUser, x.processID)
				.OrderBy(x.eventTime)
				.ToValue(), x.eventTime),
	};

query = query.Where(q => q.Diff > 0 && q.Diff <= 5);

var finalQuery = from q in query
	from u in db.GetTable<Users>().InnerJoin(u => u.UserId == q.User)
	from p in db.GetTable<Processes>().InnerJoin(p => p.ProcessId == q.Proc)
	group q by new { q.User, u.UserGroups, p.ProcessName }
	into g
	select new
	{
		g.Key.User,
		g.Key.ProcessName,
		g.Key.UserGroups,
		Sum = g.Sum(e => e.Diff) / 60
	};


finalQuery
	.Take(10)
	.ToList()
	.ForEach(Console.WriteLine);

Generated SQL:

SELECT TOP (10)
	[q].[User_1],
	[p].[ProcessName],
	[u].[UserGroups],
	Sum([q].[Diff])
FROM
	(
		SELECT
			DateDiff(minute, LAG([x].[eventTime]) OVER(PARTITION BY [x].[eventUser], [x].[processID] ORDER BY [x].[eventTime]), [x].[eventTime]) as [Diff],
			[x].[eventUser] as [User_1],
			[x].[processID] as [Proc]
		FROM
			[Process_IndexedEvents] [x]
	) [q]
		INNER JOIN [Users] [u] ON ([u].[UserId] IS NULL AND [q].[User_1] IS NULL OR [u].[UserId] = [q].[User_1])
		INNER JOIN [Processes] [p] ON [p].[ProcessId] = [q].[Proc]
WHERE
	[q].[Diff] > 0 AND [q].[Diff] <= 5
GROUP BY
	[q].[User_1],
	[u].[UserGroups],
	[p].[ProcessName]

Simple tip: if you rewriting SQL, separate it by parts, starting from subqueries.

1reaction
sdanylivcommented, Jun 15, 2019

5 minutes, rewriting your query

Read more comments on GitHub >

github_iconTop Results From Across the Web

A question about the SUM analytic function used with ...
I know that analytic functions are run after the GROUP BY clause has been processed completely and unlike ordinary aggregate functions, they ...
Read more >
SUM Analytic Function
The SUM aggregate function returns the sum of the specified values in a set. As an aggregate function it reduces the number of...
Read more >
sql - Analytical function using group by clause
If I try to calculate percentage contrbution of an INV_NUM to the portfolio. I am getting an error "Not a group by function"...
Read more >
About Aggregate and Analytic Functions
Analytic functions compute an aggregate value based on a set of values, and, unlike aggregate functions, can return multiple rows for each set...
Read more >
A question about SUM analytic function used with GROUP ...
Analytic functions are the last set of operations performed in a query except for the final ORDER BY clause. All joins and all...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found