Unicode BOM messes up first property name
See original GitHub issueFirst off, thanks for the great library! Its API is a work of art. 👍
I ran into an issue when parsing CSV files fetched from the Github API with headers: true
. The first column didn’t have the property name I expected due to an invisible character which turned out to be the UTF-8 BOM (U+FEFF
). I fixed it in my code via:
response = response.replace(/^\ufeff/, "");
However, I thought you guys might want to fix it in the library at some point (and include other BOMs too, like the UTF-16 BOM etc). It’s debatable whether this should be fixed in the library, and I could argue it both ways, but thought I should bring it to your attention and let you decide. 😃
Issue Analytics
- State:
- Created 7 years ago
- Reactions:8
- Comments:12
Top Results From Across the Web
Byte order mark - Wikipedia
The byte order mark (BOM) is a particular usage of the special Unicode character, U+FEFF BYTE ORDER MARK, whose appearance as a magic...
Read more >Code Pages, Character Encoding, Unicode, UTF-8 and the BOM
I know a little bit about Encoding (mainly that it exists and there are many different types etc). However, the idea of a...
Read more >The byte-order mark (BOM) in HTML - W3C
The name BYTE ORDER MARK is an alias for the original character name ZERO WIDTH NO-BREAK SPACE (ZWNBSP). With the introduction of U+2060...
Read more >unicode - u'\ufeff' in Python string - Stack Overflow
The Unicode character U+FEFF is the byte order mark, or BOM, and is used to tell the difference between big- and little-endian UTF-16...
Read more >Common Unicode and UTF-8 issues - HESA
The first 128 symbols of Unicode are identical to the older ASCII character ... between UTF-8 and whichever ISO character set the computer...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
It happens if we use Papa with nodejs also if the csv files contain BOM. It is not that trivial to remove BOM when using a readstream. Would it be possible to remove the DOM when parsing the header automatically?
@LeaVerou I was running into this problem too, and I fixed the problem a different way. I am using javascript’s
FileReader
API and was usingreadAsBinaryString
on a CSV file. Switching toreadAsText
for CSV files got rid of these BOMs.Not sure if this would fix your issue, but figured it was worth sharing.