question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Incorrect encoding on Windows

See original GitHub issue

When using pyserini under Windows, it seems that the encoding of strings is breaking when passed to the JNI via the pyjnius package.

It happens when a string is encoded as UTF-8 like this JString(my_str.encode('utf-8')) (e.g., https://github.com/castorini/pyserini/blob/master/pyserini/search/_searcher.py#L114). It only occurs under Windows as it must collide with the default Windows encoding CP-1252.

I discussed this issue with the maintainers of pyjnius and it seems that to make it work independently from the platform, the .encode('utf-8') could simply be dropped.

Was there a reason why this manual encoding was used in pyserini?

I created a branch with the changes, I could do a PR if you wish.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:13 (13 by maintainers)

github_iconTop GitHub Comments

1reaction
lintoolcommented, Nov 3, 2021

@stekiri I like the solution of just killing all the JString wrapping… if it doesn’t break any of our regressions tests (beyond just what the test case exposes)…

Do you have the cycles to send a PR along those lines? If not, we can ask @yuki617 .

1reaction
lintoolcommented, Oct 28, 2021

@stekiri as I discussed with @yuki617 - she’s going to create a minimal test case that manifests the issue, and we can go from there…

Read more comments on GitHub >

github_iconTop Results From Across the Web

Errors caused by Windows 10 Unicode UTF-8 encoding – nShift
Errors caused by Windows 10 Unicode UTF-8 encoding · Close nShift On-premises. · Delete the Consignor folder under AppData. · Open Windows Control ......
Read more >
Choose text encoding when you open and save files
Choose an encoding standard when you open a file · In Microsoft Windows, click the Start button, and then click Control Panel. ·...
Read more >
How do I correct the character encoding of a file?
Follow these steps with Notepad++. 1- Copy the original text. 2- In Notepad++, open new file, change Encoding -> pick an encoding you...
Read more >
Wrong encoding in Windows Registry editor - Super User
So the data in the registry value is correct, it's only that the registry editor uses the wrong encoding to show what the...
Read more >
How to Fix Word File Encoding Error? [4 Methods] - YouTube
Today's video is all about MS word file encoding errors. ... an encoding dialog box appear with three options, “ Windows Default”, “MS-DOS”, ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found