Incorrect encoding on Windows
See original GitHub issueWhen using pyserini under Windows, it seems that the encoding of strings is breaking when passed to the JNI via the pyjnius package.
It happens when a string is encoded as UTF-8 like this JString(my_str.encode('utf-8'))
(e.g., https://github.com/castorini/pyserini/blob/master/pyserini/search/_searcher.py#L114). It only occurs under Windows as it must collide with the default Windows encoding CP-1252
.
I discussed this issue with the maintainers of pyjnius
and it seems that to make it work independently from the platform, the .encode('utf-8')
could simply be dropped.
Was there a reason why this manual encoding was used in pyserini
?
I created a branch with the changes, I could do a PR if you wish.
Issue Analytics
- State:
- Created 2 years ago
- Comments:13 (13 by maintainers)
Top Results From Across the Web
Errors caused by Windows 10 Unicode UTF-8 encoding – nShift
Errors caused by Windows 10 Unicode UTF-8 encoding · Close nShift On-premises. · Delete the Consignor folder under AppData. · Open Windows Control ......
Read more >Choose text encoding when you open and save files
Choose an encoding standard when you open a file · In Microsoft Windows, click the Start button, and then click Control Panel. ·...
Read more >How do I correct the character encoding of a file?
Follow these steps with Notepad++. 1- Copy the original text. 2- In Notepad++, open new file, change Encoding -> pick an encoding you...
Read more >Wrong encoding in Windows Registry editor - Super User
So the data in the registry value is correct, it's only that the registry editor uses the wrong encoding to show what the...
Read more >How to Fix Word File Encoding Error? [4 Methods] - YouTube
Today's video is all about MS word file encoding errors. ... an encoding dialog box appear with three options, “ Windows Default”, “MS-DOS”, ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@stekiri I like the solution of just killing all the
JString
wrapping… if it doesn’t break any of our regressions tests (beyond just what the test case exposes)…Do you have the cycles to send a PR along those lines? If not, we can ask @yuki617 .
@stekiri as I discussed with @yuki617 - she’s going to create a minimal test case that manifests the issue, and we can go from there…