question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Query: Improve translation of String's StartsWith, EndsWith and Contains

See original GitHub issue

PROVIDERS BEWARE:

Linq translation for methods Contains, EndsWith and StartsWith that we have in the Relational package uses LIKE operator, which may return incorrect results if the value parameter (what we are searching for) contains wildcard characters, e.g. ‘%’ or ‘_’.

This issue addresses SqlServer and Sqlite providers, but all other providers will still use the old translation. Each provider that can be affected by this should implement their own MethodCallTranslators for Contains, EndsWith and StartsWith.

Currently in EF7 a LINQ query like this:

var things = Things.Where(t => t.Name.StartsWith("a"));

Gets translated to SQL like this (note that I am simplifying the query and expanding parameter values for clarity):

SELECT * FROM Things WHERE Name LIKE 'a%' ;

However, in order to return correct results, a LINQ query like this:

var underscoreAThings = Things.Where(t => t.Name.StartsWith("_a"));

Should be translated to SQL like this:

SELECT * FROM Things WHERE Name LIKE '~_a%' ESCAPE '~';

The escaping accounts for SQL wildcard characters in the input string which should not be treated as wildcards (we can add a separate Like() method for passing patterns, but that belongs in a separate work item).

When the input string is store correlated (e.g. is another column in the database instead of parameter or a literal in the query) using LIKE in the translation correctly becomes more difficult, e.g. it would be hard to perform the required escaping in SQL.

In general for cases in which LIKE doesn’t work well we can fall back to alternative translations that don’t rely on LIKE, e.g. for String.StartsWith():

var underscoreAThings = Things.Where(t => t.Name.StartsWith(t.Prefix));
SELECT * FROM Things WHERE CHARINDEX(Prefix, Name) = 1 OR Prefix='';

Note that CHARINDEX() won’t match an empty string but String.StartsWith("") always return true, that’s why we add the Prefix =‘’ condition.

The main disadvantage of this translation is that it is not sargable. That can be addressed with a hybrid translation, e.g.:

SELECT * FROM Things WHERE Name LIKE Prefix+'%' AND (CHARINDEX(Prefix, Name) = 1 OR Prefix = '');

This should be quick to evaluate using an index because the LIKE condition should be able to take advantage of the index to produce fairly selective results and the second condition will filter out false positives returned by LIKE.

Notice that this alternative removes the need to fiddle with the input value: we no longer need to escape wildcards because in the worse case they will produce false positive matches which the CHARINDEX() based condition will still be able to filter out.

Also notice that based on the current query caching design we wouldn’t need to always produce this more complex translation. Instead, we could sniff into the argument of String.StartsWith() and pivot on it to produce different translations, e.g.:

  1. If the value is opaque (i.e. it comes from the store) or if it contains a wildcard character, then produce the condition based on CHARINDEX()
  2. If the value does not contain a wildcard character in the first position then we can emit the condition based on LIKE

Similar approaches can be used for String.EndsWith() and String.Contains(). However for these methods LIKE does not really contribute to the performance since the beginning of the input value cannot be used to perform index lookups, so it should be ok to produce a translation that doesn’t use LIKE at all.

Issue Analytics

  • State:closed
  • Created 9 years ago
  • Reactions:4
  • Comments:37 (26 by maintainers)

github_iconTop GitHub Comments

1reaction
rojicommented, Feb 8, 2017

@jemiller0 uh, happy to have helped although I didn’t really mean to 😃 Please note that my proposal was to use LIKE AND LEFT(LEN()) and not LEFT(LEN()) alone, simply because I assumed that LEFT(LEN()) wouldn’t be index-optimized. This retains the original logic of using LIKE first for speed, then filtering out false positives with something (CHARINDEX or LEFT(LEN())). So I’m not clear if you’re still doing LIKE AND LEFT(LEN()) or have switched to LEFT(LEN()) on its own. I’ll let the EF Core team comment further on the usefulness of LEFT(LEN()) for SqlServer.

Regarding prepared statements, I don’t think there’s any relevance to the statement type (SELECT, INSERT, UPDATE…). All of them greatly benefit from preparation in PostgreSQL, whereas in SqlServer I’m assuming all statement types are implicitly cached without the need for explicit preparation (here’s a doc page on this). It may be worth testing to confirm actual SqlServer behavior. Regardless, it seems like a good to implement preparation in EF Core simply to have all providers benefit from it, and EF Core may be in a good position to know what to prepare and what not to prepare. If you want to continue this conversation it may be better to do so in #5459.

Regarding case sensitivity in PostgreSQL, unquoted identifiers are always folded to lowercase, whereas quoted ones maintain case (this is why Npgsql EF Core provider systematically quotes all identifiers). Since this is also off-topic feel free to open an issue in the Npgsql repo to continue discussing.

1reaction
rojicommented, Feb 8, 2017

Thanks for the valuable discussion. To summarize, at least in Npgsql I’m going to have the StartsWith() translator:

  1. Check whether the pattern is constant or not.
  2. If constant, escape everything client-side in C# and send a simple LIKE (with backslash being the PostgreSQL default escape character)
  3. Otherwise (parameters, store values), send LIKE AND STRPOS (PG’s CHARINDEX equivalent) just like you guys are doing today.

As a very minor implementation note, wouldn’t it be slightly better to to replace CHARINDEX with LEFT(LEN(<pattern>)), similar to how EndsWith() is currently implemented? This would avoid going through the entire string, searching for the pattern.

Read more comments on GitHub >

github_iconTop Results From Across the Web

StartsWith() doesn't translate to Like('abc%') in LINQ
This has been tracked by #474 - Query: Improve translation of String's StartsWith, EndsWith and Contains. AFAIK support for native SQL LIke ...
Read more >
String.StartsWith Method (System)
Determines whether the beginning of this string instance matches the specified string when compared using the specified culture.
Read more >
Efficient Querying - EF Core
Performance guide for efficient querying using Entity Framework Core. ... StartsWith("A")) // Translated to SQL and executed in the database ...
Read more >
startsWith: Does String Start or End With Another String? - rdrr.io
Determines if entries of x start or end with string (entries of) prefix or suffix respectively, where strings are recycled to common lengths....
Read more >
%STARTSWITH | InterSystems IRIS Data Platform 2023.2
STARTSWITH - Matches a value with a substring specifying initial ... You can use %STARTSWITH in any predicate condition of an InterSystems SQL...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found