question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unicode BOM messes up first property name

See original GitHub issue

First off, thanks for the great library! Its API is a work of art. 👍

I ran into an issue when parsing CSV files fetched from the Github API with headers: true. The first column didn’t have the property name I expected due to an invisible character which turned out to be the UTF-8 BOM (U+FEFF). I fixed it in my code via:

response = response.replace(/^\ufeff/, "");

However, I thought you guys might want to fix it in the library at some point (and include other BOMs too, like the UTF-16 BOM etc). It’s debatable whether this should be fixed in the library, and I could argue it both ways, but thought I should bring it to your attention and let you decide. 😃

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Reactions:8
  • Comments:12

github_iconTop GitHub Comments

4reactions
kaligrafycommented, Feb 17, 2019

It happens if we use Papa with nodejs also if the csv files contain BOM. It is not that trivial to remove BOM when using a readstream. Would it be possible to remove the DOM when parsing the header automatically?

3reactions
cweezycommented, Mar 28, 2017

@LeaVerou I was running into this problem too, and I fixed the problem a different way. I am using javascript’s FileReader API and was using readAsBinaryString on a CSV file. Switching to readAsText for CSV files got rid of these BOMs.

Not sure if this would fix your issue, but figured it was worth sharing.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Byte order mark - Wikipedia
The byte order mark (BOM) is a particular usage of the special Unicode character, U+FEFF BYTE ORDER MARK, whose appearance as a magic...
Read more >
Code Pages, Character Encoding, Unicode, UTF-8 and the BOM
I know a little bit about Encoding (mainly that it exists and there are many different types etc). However, the idea of a...
Read more >
The byte-order mark (BOM) in HTML - W3C
The name BYTE ORDER MARK is an alias for the original character name ZERO WIDTH NO-BREAK SPACE (ZWNBSP). With the introduction of U+2060...
Read more >
unicode - u'\ufeff' in Python string - Stack Overflow
The Unicode character U+FEFF is the byte order mark, or BOM, and is used to tell the difference between big- and little-endian UTF-16...
Read more >
Common Unicode and UTF-8 issues - HESA
The first 128 symbols of Unicode are identical to the older ASCII character ... between UTF-8 and whichever ISO character set the computer...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found