question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Problem with the encoding of Cyrillic characters

See original GitHub issue

There are problems with the encoding of Cyrillic characters in some files. The sample file I uploaded to google drive: file xls. This file opens correctly in Excel 16.0. The result of the conversion to csv from the page http://oss.sheetjs.com/js-xlsx/

Îñòàòêè ÒÌÖ íà ñêëàäàõ,,,,,,,,,,,,,,,,,,,,,,,,,,
Íà äàòó: 04.12.17,,,,,,,,,,,,,,,,,,,,,,,,,,
"Ïî íîìåíêëàòóðíûì ïîçèöèÿì èç ñïèñêà (""Àâòîøèíû"").",,,,,,,,,,,,,,,,,,,,,,,,,,
Íîìåíêëàòóðà,,,Åä.,Ñêëàä ã. ×åëÿáèíñê,,Ñëîáîäñêîé ïåð. 45,,,,,,,,,,,,,,,,,,,,
,,,,öåíà À,Ñâîáîäíûé,öåíà À,Ñâîáîäíûé,,,,,,,,,,,,,,,,,,,
51344,-----,16.5/70-18 TT ÂØÇ ÊÔ-97 íñ10 ñ îá.ëåíòîé,êîìïë, ,0,"14,008.00",>12,,,,,,,,,,,,,,,,,,,

Can it be fixed by some manipulation:

const iconv = require('iconv-lite');
console.log(iconv.decode(iconv.encode('Îñòàòêè ÒÌÖ íà ñêëàäàõ,,,,,,,,,,,,,,,,,,,,,,,,,,', 'cp1252'), 'cp1251'));
//display "Остатки ТМЦ на складах,,,,,,,,,,,,,,,,,,,,,,,,,,"

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
SheetJSDevcommented, Dec 4, 2017

@makcbrain thanks for sharing! This is a BIFF5 XLS (Excel 5.0/95) file with no CodePage record, so there’s no way to inspect the file and determine the correct encoding. To see that this is a file ambiguity, try opening this in Excel 2016 for Mac and you’ll see different content corresponding to the default Mac Roman codepage 10000:

That string does correspond to the original set of bytes, as you can verify manually:

var cptable = require('codepage'); 
cptable.utils.decode(1251, cptable.utils.encode(1252, "Îñòàòêè ÒÌÖ íà ñêëàäàõ"));
// 'Остатки ТМЦ на складах'
cptable.utils.decode(1251, cptable.utils.encode(10000, "ŒÒÚ‡ÚÍË “Ã÷ ̇ ÒÍ·‰‡ı"));
// 'Остатки ТМЦ на складах'

Just as discussed in #907 the final solution will involve adding a default codepage option to the read functions (e.g. XLSX.readFile("file.xls", {codepage:1251}))

0reactions
SheetJSDevcommented, Oct 22, 2020

pass the codepage option to read or readFile https://github.com/SheetJS/sheetjs/#parsing-options .

Read more comments on GitHub >

github_iconTop Results From Across the Web

how to solve problem with the encoding of Cyrillic characters
Hi, I've front problem with cyrillic characker in Confluence. All of them is shown like this - "???". scrinshot My confluence version is....
Read more >
Weird problem with cyrillic characters - TechNet - Microsoft
Hi, This issue could be related to message encoding. Please try to open one of such email message, under Message tab, click Actions...
Read more >
Character encoding issues for Russian Chars - Stack Overflow
Solved it all by adding AddDefaultCharset UTF-8 to the .htaccess. Apparently the server was going for another character encoding such as ...
Read more >
Problem with cyrillic characters - RStudio IDE
I have several dataframes in Russian in UTF-8, and earlier before 4.0.4 update strings were displayed in the console correctly but now they ......
Read more >
Cyrillic Character Encoding Issues - Blazored/LocalStorage
Found this page when googling why my data's scandic letters (ä,ö,å) were encoded, OP probably had the same issue with cyrillic characters.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found