Some CSV files include a BOM in the three first bytes. Proposal: skip BOMs, or give the option to skip BOMs?
See original GitHub issueFor some (probably historical) reason, some tools require that UTF-8 encoded CSV files start with a BOM for their three first bytes (Excel 👀 ?). As a result, many tools produce CSV files that start with the three bytes 0xefbbbf
.
These three chars are not printable, and are not seen in the output, but still present.
This results in the following BOM-plagged file (BOM not shown):
a;b
0;1
to result in what seems like [{"a":"0","b":"1"}]
, but in reality the first key is not a
, which should look like 0x61
, it is 0xefbbbf61
. But it prints just like a
.
So if you pipe the output of jc --csv
of this file to the jq query jq '.[].a'
, it results in null
(it should result in "0"
). And it is hard to see why without knowing about BOMs.
Proposal: ignore the first three bytes if they match 0xefbbbf
in csv files (maybe other formats?), or add an option to ignore them if present?
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:10 (7 by maintainers)
Top GitHub Comments
Nice! No worries - I’ll close the issue once I have released
jc
v1.22.2 with the fix. Thanks again!Released in
jc
v1.22.2.