Allow to configure a list of encodings to use when guessing
See original GitHub issueThe files.autoGuessEncoding=true
doesn’t work well in some circumstances.
I think that would be good if you guys add some features like files.forceEncoding="encode1:encode2,encode3:encode4"
.
So it can force ‘encode1’ to ‘encode2’. That’s a solution for wrong encoding detection I think.
Issue Analytics
- State:
- Created 6 years ago
- Reactions:52
- Comments:54 (11 by maintainers)
Top Results From Across the Web
Encode::Guess - qw - /euc-jp shiftjis
Encode::Guess enables you to guess in what encoding a given data is encoded, or at least tries to. #DESCRIPTION. By default, it checks...
Read more >VS Code: Can you set allowed encodings for "guess encoding ...
All my files are either UTF8 or Windows-1252, but VS Code incorrectly assumes some Greek encoding on files repeatedly. Can you set up...
Read more >codecs — Codec registry and base classes — Python 3.11.1 ...
encoding specifies the encoding which is to be used for the file. Any encoding that encodes to and decodes from bytes is allowed,...
Read more >Encode::Guess -- Guesses encoding from data
tries all major Japanese Encodings as well use Encode::Guess qw/euc-jp shiftjis 7bit-jis/;. If the $Encode::Guess::NoUTFAutoGuess variable is set to a true ...
Read more >https://perldoc.perl.org/5.12.2/Encode::Guess.txt
package Encode::Guess; use strict; use warnings; use Encode qw(:fallbacks ... __PACKAGE__; use base qw(Encode::Encoding); sub needs_lines { 1 } sub ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I agree. In my environment we have files in two encodings - UTF-8 and Windows1251 (most popular text file encoding in Russia), so I need to use encoding detection. However, it sometimes detects windows1251-encoded files as “maccyrillic” or “Windows1252” or some other encoding that I’ve never seen in my life 😄 Definitely need a setting like
files.detectEncodings=["utf8","windows1251]
So instead of just “true”, you can specify which encodings you want it to detect from. As far as I know, encoding detection works based on probabilities (you can’t 100% say which files is which encoding, so the software has to pick the most probable answer), so I think it is possible to implement - just filter out the list of possible encoding to those user selected.
Verification: There is now a
files.guessableEncodings
setting where you can fill in encodings to support when guessing. From the explanation: If provided, will restrict the list of encodings that can be used when guessing. If the guessed file encoding is not in the list, the default encoding will be used.Update: I decided to rename the setting to
files.guessableEncodings