question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unicode conversion issue

See original GitHub issue

Environment

  • Pythonnet version: 3.0.0-preview2021-06-04
  • Python version: 3.8.8
  • Operating System: ubuntu 21.04
  • .NET Runtime: 5.0.300

Details

Unicode characters get lost / mangled during conversion:

scope.Exec("testStr = 'Nom 🍗';");
scope.Exec("""print("python:", testStr);""")
Console.WriteLine($"""dotnet: {scope.Get("testStr").ToString()}""");

writes

python: Nom 🍗
dotnet: Nom �

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
pkesecommented, Jun 8, 2021

It looks this problem goes in both directions:

  • when converting from dotnet, Python expects the string "foo🐼" to be 4 characters long, but we somehow create a Python object with Length 5.
  • when converting from Python to dotnet, we truncate the string as described above. In my test #1467, it appears that there is some sort of size mismatch and a null pointer is returned somewhere in the call stack.

I found some hints of how Python treats unicode conversion internally https://stackoverflow.com/questions/36098984/python-3-3-c-api-and-utf-8-strings

Using PyUnicode_DecodeUTF16 or PyUnicode_DecodeUTF16Stateful instead of PyUnicode_FromKindAndData would maybe alleviate this problem. But getting these functions imported from .dlls and getting delegates in place is probably beyond my capacity.

0reactions
pkesecommented, Jun 11, 2021

And thank You, @filmor, for fixing all the rest.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshooting Unicode Conversion and I18N issues: BC ...
You have problems during the Unicode Conversion, question how to execute specific steps or general Internationalization (I18N) problems.
Read more >
SSIS Convert Between Unicode and Non-Unicode Error
Open the data conversion block and tick the column for which the error is showing. Below change its data type to unicode string(DT_WSTR)...
Read more >
Re: Unicode character conversion issue - KX Community
Hi,. While i was trying to convert the unicode character of ô it is showing a different number. q)`$`char$147 `ô q)`$`char$244
Read more >
SSIS Oracle Unicode conversion issue
When moved to SS 2008 64 bit server and run, I get unicode conversion error. I was hoping that someone else has done...
Read more >
Cannot Convert Between Unicode and Non-Unicode String ...
This article looks at several ways to handle the SSIS error: cannot convert between Unicode and non-Unicode string data types.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found