Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Some CSV files include a BOM in the three first bytes. Proposal: skip BOMs, or give the option to skip BOMs?

See original GitHub issue

For some (probably historical) reason, some tools require that UTF-8 encoded CSV files start with a BOM for their three first bytes (Excel 👀 ?). As a result, many tools produce CSV files that start with the three bytes 0xefbbbf.

These three chars are not printable, and are not seen in the output, but still present.

This results in the following BOM-plagged file (BOM not shown):

a;b
0;1

to result in what seems like [{"a":"0","b":"1"}], but in reality the first key is not a, which should look like 0x61, it is 0xefbbbf61. But it prints just like a.

So if you pipe the output of jc --csv of this file to the jq query jq '.[].a', it results in null (it should result in "0"). And it is hard to see why without knowing about BOMs.

Proposal: ignore the first three bytes if they match 0xefbbbf in csv files (maybe other formats?), or add an option to ignore them if present?

Issue Analytics

State:
Created a year ago
Reactions:1
Comments:10 (7 by maintainers)

Top GitHub Comments

2reactions

kellyjonbrazilcommented, Oct 25, 2022

Nice! No worries - I’ll close the issue once I have released jc v1.22.2 with the fix. Thanks again!

1reaction

kellyjonbrazilcommented, Nov 8, 2022

Released in jc v1.22.2.

Top Results From Across the Web

"csv" files with UTF-8 with BOM encoding are returning the first ...

Just to make sure: you mean that CSV content that has UTF-8 BOM (3 bytes) will cause first header name to be reported...

Byte order mark screws up file reading in Java - Stack Overflow

When present, the byte order gets read along with the rest of the first line, thus causing problems with string compares. Is there...

The byte-order mark (BOM) in HTML - W3C

In the UTF-8 encoding, the presence of the BOM is not essential because, unlike the UTF-16 encodings, there is no alternative sequence of...

Should UTF-8 CSV files contain a BOM (byte order mark)?

The one question left open is: Should we add a BOM at the start or not? I have read multiple opinions and pros/cons...

Import BOM from a CSV file - MRPeasy

If importing some data failed, a skip file will be generated and downloaded to your computer, which indicates all errors line-by-line. Importing new...