Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unicode conversion issue

See original GitHub issue

Environment

Pythonnet version: 3.0.0-preview2021-06-04
Python version: 3.8.8
Operating System: ubuntu 21.04
.NET Runtime: 5.0.300

Details

Unicode characters get lost / mangled during conversion:

scope.Exec("testStr = 'Nom 🍗';");
scope.Exec("""print("python:", testStr);""")
Console.WriteLine($"""dotnet: {scope.Get("testStr").ToString()}""");

writes

python: Nom 🍗
dotnet: Nom �

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:8 (8 by maintainers)

Top GitHub Comments

1reaction

pkesecommented, Jun 8, 2021

It looks this problem goes in both directions:

when converting from dotnet, Python expects the string "foo🐼" to be 4 characters long, but we somehow create a Python object with Length 5.
when converting from Python to dotnet, we truncate the string as described above. In my test #1467, it appears that there is some sort of size mismatch and a null pointer is returned somewhere in the call stack.

I found some hints of how Python treats unicode conversion internally https://stackoverflow.com/questions/36098984/python-3-3-c-api-and-utf-8-strings

Using PyUnicode_DecodeUTF16 or PyUnicode_DecodeUTF16Stateful instead of PyUnicode_FromKindAndData would maybe alleviate this problem. But getting these functions imported from .dlls and getting delegates in place is probably beyond my capacity.

0reactions

pkesecommented, Jun 11, 2021

And thank You, @filmor, for fixing all the rest.