question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

SQL Server UTF8 collations

See original GitHub issue

This does not behave as documented and expected.

If I have a entity for which I use Fluent API to define properties… The SQL field (in my example) is varchar(255) using collation Latin1_General_100_BIN2_UTF8 in EF defined as p.Property(prop => prop.Param).IsUnicode(false).UseCollation("Latin1_General_100_BIN2_UTF8").HasMaxLength(255);

However, unicode chars get’s corrupted anyway on SQL both on Azure as on 2019 express.


Document Details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:25 (13 by maintainers)

github_iconTop GitHub Comments

1reaction
rojicommented, Mar 8, 2022

Design proposal:

tl;dr allow users to configure UTF8 by explicitly setting both the column type to char/varchar and IsUnicode to true:

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
    modelBuilder.Entity<Blog>()
        .Property(b => b.Name)
        .HasColumnType("varchar(max)")
        .IsUnicode(true)
        .UseCollation("LATIN1_GENERAL_100_CI_AS_SC_UTF8");
}

We can add a sugar method which does the above:

modelBuilder.Entity<Blog>()
    .Property(b => b.Name)
    .UseUTF8("LATIN1_GENERAL_100_CI_AS_SC_UTF8");

Notes:

  • Today, explicitly setting the column to char/varchar also sets DbType=AnsiString (note that Unicode still remains true in the type mapping - not ideal).
  • Explicitly setting to Unicode to true currently has no effect if the store type is set to char/varchar, i.e. DbType is still AnsiString.
  • We can allow the user to explicitly set varchar(max) and IsUnicode=true - this would opt into UTF8. The column type is exactly what it should be in migrations (and also in the query pipline etc.), and IsUnicode tells us to send DbType.String instead of DbType.AnsiString.
  • Note that this isn’t enough: a collation is needed as well (and it has to be explicit). We can add model validation that for all UTF8 properties (char/varchar with Unicode=true), to check for a UTF8-compatible collation (ends with _UTF8).
  • Scaffolding: look into doing this reliably. The combination of a char/varchar property with a collation ending with UTF8 (including at the database level) should lead to the correct UTF8 property being scaffolded (i.e. with either UseUTF8 or IsUnicode(true)`)

Global model configuration

The default database collation can already be set via modelBuilder.UseCollation():

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
    modelBuilder.UseCollation("LATIN1_GENERAL_100_CI_AS_SC_UTF8");
}

All string properties can be configured to be UTF8 by default via pre-convention model configuration:

protected override void ConfigureConventions(ModelConfigurationBuilder configurationBuilder)
{
    configurationBuilder.DefaultTypeMapping<string>(b => b.HasColumnType("varchar(max)").IsUnicode(true));
}

We could also add a ConfigureUTF8() extension method to do the above.

1reaction
rojicommented, Mar 7, 2022

@clement911

So I believe ideally we would want to pass a varchar parameter and also indicates that the collation of the parameter is Latin1_General_100_BIN2_UTF8 (or whatever other actually UTF8 collation was used). The problem I see is that neither Microsoft.Data.SqlClient.SqlParameter nor sp_executesql allows passing the collation name of given parameters.

The collation isn’t something that gets specified on a parameter, but rather on the column (or at the database level for all columns); see our docs for more info on this.

Aside from that, as @egbertn wrote above, a workaround exists but requiring editing the migration to change the type to varchar (doing something better is what this issue tracks). There’s no reason to avoid editing the migration file - it’s perfectly fine (and frequently recommended) to customize migration code after generating it, see our docs. I definitely wouldn’t avoid UTF8 just because it requires a one-time edit to migration code.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Collation and Unicode support - SQL Server
Collation ; UTF-8 (_UTF8), Enables UTF-8 encoded data to be stored in SQL Server. If this option isn't selected, SQL Server uses the...
Read more >
How to Use UTF-8 Collation in SQL Server database?
UTF-8 is one way of saving Unicode. What you have used to represent the Unicode is escape codes used in string literals, that's...
Read more >
Introducing UTF-8 support for SQL Server
Like UTF-16, UTF-8 is only available to Windows collations that support Supplementary Characters, as introduced in SQL Server 2012.
Read more >
SQL Server UTF-8 support - 4Js
Support for UTF-8 collation in CHAR/VARCHAR columns with SQL Server 2019. Microsoft™ SQL Server 2019 introduced support for UTF-8 database collations: When ...
Read more >
Impact of UTF-8 support in SQL Server 2019
The new UTF-8 collations can provide benefits in storage space, but if page compression is used, the benefit is no better than older...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found