question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Some CSV files include a BOM in the three first bytes. Proposal: skip BOMs, or give the option to skip BOMs?

See original GitHub issue

For some (probably historical) reason, some tools require that UTF-8 encoded CSV files start with a BOM for their three first bytes (Excel 👀 ?). As a result, many tools produce CSV files that start with the three bytes 0xefbbbf.

These three chars are not printable, and are not seen in the output, but still present.

This results in the following BOM-plagged file (BOM not shown):

a;b
0;1

to result in what seems like [{"a":"0","b":"1"}], but in reality the first key is not a, which should look like 0x61, it is 0xefbbbf61. But it prints just like a.

So if you pipe the output of jc --csv of this file to the jq query jq '.[].a', it results in null (it should result in "0"). And it is hard to see why without knowing about BOMs.

Proposal: ignore the first three bytes if they match 0xefbbbf in csv files (maybe other formats?), or add an option to ignore them if present?

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:1
  • Comments:10 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
kellyjonbrazilcommented, Oct 25, 2022

Nice! No worries - I’ll close the issue once I have released jc v1.22.2 with the fix. Thanks again!

1reaction
kellyjonbrazilcommented, Nov 8, 2022

Released in jc v1.22.2.

Read more comments on GitHub >

github_iconTop Results From Across the Web

"csv" files with UTF-8 with BOM encoding are returning the first ...
Just to make sure: you mean that CSV content that has UTF-8 BOM (3 bytes) will cause first header name to be reported...
Read more >
Byte order mark screws up file reading in Java - Stack Overflow
When present, the byte order gets read along with the rest of the first line, thus causing problems with string compares. Is there...
Read more >
The byte-order mark (BOM) in HTML - W3C
In the UTF-8 encoding, the presence of the BOM is not essential because, unlike the UTF-16 encodings, there is no alternative sequence of...
Read more >
Should UTF-8 CSV files contain a BOM (byte order mark)?
The one question left open is: Should we add a BOM at the start or not? I have read multiple opinions and pros/cons...
Read more >
Import BOM from a CSV file - MRPeasy
If importing some data failed, a skip file will be generated and downloaded to your computer, which indicates all errors line-by-line. Importing new...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found