Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

SQL Server UTF8 collations

See original GitHub issue

This does not behave as documented and expected.

If I have a entity for which I use Fluent API to define properties… The SQL field (in my example) is varchar(255) using collation Latin1_General_100_BIN2_UTF8 in EF defined as p.Property(prop => prop.Param).IsUnicode(false).UseCollation("Latin1_General_100_BIN2_UTF8").HasMaxLength(255);

However, unicode chars get’s corrupted anyway on SQL both on Azure as on 2019 express.

Document Details

⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

ID: b27be469-3758-941c-0c5d-3a54b1545493
Version Independent ID: d89924aa-668e-229c-8170-5731c3fff4c1
Content: Collations and case sensitivity - EF Core
Content Source: entity-framework/core/miscellaneous/collations-and-case-sensitivity.md
Product: entity-framework
Technology: entity-framework-core
GitHub Login: @roji
Microsoft Alias: avickers

Issue Analytics

State:
Created 2 years ago
Comments:25 (13 by maintainers)

Top GitHub Comments

1reaction

rojicommented, Mar 8, 2022

Design proposal:

tl;dr allow users to configure UTF8 by explicitly setting both the column type to char/varchar and IsUnicode to true:

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
    modelBuilder.Entity<Blog>()
        .Property(b => b.Name)
        .HasColumnType("varchar(max)")
        .IsUnicode(true)
        .UseCollation("LATIN1_GENERAL_100_CI_AS_SC_UTF8");
}

We can add a sugar method which does the above:

modelBuilder.Entity<Blog>()
    .Property(b => b.Name)
    .UseUTF8("LATIN1_GENERAL_100_CI_AS_SC_UTF8");

Notes:

Today, explicitly setting the column to char/varchar also sets DbType=AnsiString (note that Unicode still remains true in the type mapping - not ideal).
Explicitly setting to Unicode to true currently has no effect if the store type is set to char/varchar, i.e. DbType is still AnsiString.
We can allow the user to explicitly set varchar(max) and IsUnicode=true - this would opt into UTF8. The column type is exactly what it should be in migrations (and also in the query pipline etc.), and IsUnicode tells us to send DbType.String instead of DbType.AnsiString.
Note that this isn’t enough: a collation is needed as well (and it has to be explicit). We can add model validation that for all UTF8 properties (char/varchar with Unicode=true), to check for a UTF8-compatible collation (ends with _UTF8).
Scaffolding: look into doing this reliably. The combination of a char/varchar property with a collation ending with UTF8 (including at the database level) should lead to the correct UTF8 property being scaffolded (i.e. with either UseUTF8 or IsUnicode(true)`)

Global model configuration

The default database collation can already be set via modelBuilder.UseCollation():

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
    modelBuilder.UseCollation("LATIN1_GENERAL_100_CI_AS_SC_UTF8");
}

All string properties can be configured to be UTF8 by default via pre-convention model configuration:

protected override void ConfigureConventions(ModelConfigurationBuilder configurationBuilder)
{
    configurationBuilder.DefaultTypeMapping<string>(b => b.HasColumnType("varchar(max)").IsUnicode(true));
}

We could also add a ConfigureUTF8() extension method to do the above.

1reaction

rojicommented, Mar 7, 2022

@clement911

So I believe ideally we would want to pass a varchar parameter and also indicates that the collation of the parameter is Latin1_General_100_BIN2_UTF8 (or whatever other actually UTF8 collation was used). The problem I see is that neither Microsoft.Data.SqlClient.SqlParameter nor sp_executesql allows passing the collation name of given parameters.

The collation isn’t something that gets specified on a parameter, but rather on the column (or at the database level for all columns); see our docs for more info on this.

Aside from that, as @egbertn wrote above, a workaround exists but requiring editing the migration to change the type to varchar (doing something better is what this issue tracks). There’s no reason to avoid editing the migration file - it’s perfectly fine (and frequently recommended) to customize migration code after generating it, see our docs. I definitely wouldn’t avoid UTF8 just because it requires a one-time edit to migration code.

Top Results From Across the Web

Collation and Unicode support - SQL Server

Collation ; UTF-8 (_UTF8), Enables UTF-8 encoded data to be stored in SQL Server. If this option isn't selected, SQL Server uses the...

How to Use UTF-8 Collation in SQL Server database?

UTF-8 is one way of saving Unicode. What you have used to represent the Unicode is escape codes used in string literals, that's...

Introducing UTF-8 support for SQL Server

Like UTF-16, UTF-8 is only available to Windows collations that support Supplementary Characters, as introduced in SQL Server 2012.

SQL Server UTF-8 support - 4Js

Support for UTF-8 collation in CHAR/VARCHAR columns with SQL Server 2019. Microsoft™ SQL Server 2019 introduced support for UTF-8 database collations: When ...

Impact of UTF-8 support in SQL Server 2019

The new UTF-8 collations can provide benefits in storage space, but if page compression is used, the benefit is no better than older...