question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

UTF-8-BOM string parsing - header first name incorrectly enclosed in a double quote

See original GitHub issue

When a file is encoded as UTF-8-BOM, PapaParse CSV to Json incorrectly returns the records with the first object key name enclosed in a single quote. One cannot then reference the field called name (example below). record.name then doesn’t exist. The field is record.‘name’ which is not easily accessible in JavaScript using record.name or record[name] etc. You can only see by printing the record to the console, or using a for-in loop.

The subsequent object keys are correct without quotes.

Change the file encoding to UTF-8 and the keys are normal, without a quote.

"PapaConfig": {
    "quotes": true,
    "quoteChar": "\"",
    "escapeChar": "\"",
    "delimiter": ",",
    "header": true,
    "skipEmptyLines": true,
    "columns": null
}

Papa.parse(csvData, PapaConfig)

csvData (subset):

name,phone
De Akker Guest House,0514442010

UTF-8-BOM encoding:

[
  {
    'name': 'De Akker Guest House',
    phone: '0514442010',

UTF-8 encoding:

[
  {
    'name': 'De Akker Guest House',
    phone: '0514442010',

Excel exports csv files to UTF-8-BOM, possibly because that encoding is supposedly faster and more reliable. Can PapaParse be changed to handle UTF-8-BOM correctly?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:7
  • Comments:6

github_iconTop GitHub Comments

3reactions
duhmojocommented, Jan 20, 2022

I went with this approach, not the most efficient:

        const stripBom = function(str) {
                if (str.charCodeAt(0) === 0xfeff) {
                    return str.slice(1)
                }
                return str
        }

        papaparse.parse(csvFile, {
            step: function(row, parser) {
                ...
                const data = Object.fromEntries(
                    Object.entries(row.data).map(([k, v]) => [stripBom(k), v])
                )

Since csvFile is a read stream, not a pre-read file, I just tossed it in there for each step. I could do it only for the 1st step and skip if its anything but the 1st row.

0reactions
duhmojocommented, Jan 19, 2022

I’m having the same issue in 2022. I was given some external CSV file, probably edited/written on Windows, processing it on Linux with papaparse and I was unable to access the first row property defined by the header. When I console.log(row.data) I would see the property key quoted:

{
  'CID': '164.306(a)',
  Section: 'Ensure Confidentiality, Integrity and Availability',
}

I edited the original CSV and simply retyped the first character in the head, then reran:

{
  CID: '164.306(a)',
  Section: 'Ensure Confidentiality, Integrity and Availability',
}

I’m using const csvFile = fs.createReadStream(csvFilename); and I tried switching to const csvFile = fs.readFileSync(csvFilename, { encoding: 'utf-8'}); without luck. I read BOM was supposed to strip with readFileSync but it doesn’t work for me at least: https://github.com/nodejs/node-v0.x-archive/issues/1918

Read more comments on GitHub >

github_iconTop Results From Across the Web

UTF-8-BOM string parsing - header first name incorrectly ...
UTF-8-BOM string parsing - header first name incorrectly enclosed in a double quote ... When a file is encoded as UTF-8-BOM, PapaParse CSV...
Read more >
Unknown UTF-8 code units closing double quotes
in the output file. I am using the sax-parser for the xml-parsing. Are there any known bugs that could cause such a behaviour?...
Read more >
Import-Csv (Microsoft.PowerShell.Utility)
Enter column headers as a character-separated list. Do not enclose the header string in quotation marks. Enclose each column header in single quotation...
Read more >
Comma-separated values - Wikipedia
The use of the comma as a field separator is the source of the name for this file format. A CSV file typically...
Read more >
[RESOLVED] CSV files and a big problem with notes field!
0 0 A Title, with a "double-quoted" part 0 1 First Name 0 2 ... that string values are always surrounded in double...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found