question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

UTF-8 non-ascii string values cause string to be truncated

See original GitHub issue

Describe the bug

UTF-8 non-ascii string values cause string to be truncated. In reproduction case it’s a shapefile that has a value Befæstet which become Befæste. Likely string char length is assumed to be byte length somewhere.

To Reproduce

Clone https://github.com/bjornharrtell/gdal.netcore.utf8issuerepro then do a dotnet build then a dotnet run.

Expected behavior

Should not corrupt strings.

Environment information:

  • OS (version): Ubuntu 20.04
  • Package version (core): [e.g. 3.0.1.25]
  • Package version (runtime): [e.g. 3.0.1.2]

Additional context

I’ve also reproduced this with other formats (fx. GML) and my own builds of gdal.netcore and when running on Debian 10, so this seems to sit down deep somewhere.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:13 (12 by maintainers)

github_iconTop GitHub Comments

1reaction
MaxRev-Devcommented, Nov 9, 2020

@bjornharrtell Looks like this bug was fixed in GDAL v3.2.0RC1. Changes will apply in corresponding versions of packages - v3.2.0.x milestone

1reaction
bjornharrtellcommented, Jun 4, 2020

@MaxRev-Dev fully agreed. GDAL PR done with https://github.com/OSGeo/gdal/pull/2649.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Strings containing non-ASCII characters are truncated by ...
I don't see any point in using Unicode on the backend and a code page in the frontend of a multilingual application.
Read more >
rfc6532
In this document, non-ASCII strings are UTF-8 strings if they are in header field values that contain at least one <UTF8-non-ascii> (see Section...
Read more >
COPY INTO error caused by non-ASCII characters
Hello, I get an error message when executing the COPY INTO statement. The error message is caused by some non-ASCII characters in my...
Read more >
Character string functions
Returns the numeric ASCII value of the first character in the specified string. For the NCHAR version of this function, see unicode( s...
Read more >
UTF-8 Support
This release provides a UTF-8 aware behavior for Impala STRING type to get consistent behavior with Hive on UTF-8 strings using a query...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found