UTF-8-BOM string parsing - header first name incorrectly enclosed in a double quote
See original GitHub issueWhen a file is encoded as UTF-8-BOM, PapaParse CSV to Json incorrectly returns the records with the first object key name enclosed in a single quote. One cannot then reference the field called name (example below). record.name then doesn’t exist. The field is record.‘name’ which is not easily accessible in JavaScript using record.name or record[name] etc. You can only see by printing the record to the console, or using a for-in loop.
The subsequent object keys are correct without quotes.
Change the file encoding to UTF-8 and the keys are normal, without a quote.
"PapaConfig": {
"quotes": true,
"quoteChar": "\"",
"escapeChar": "\"",
"delimiter": ",",
"header": true,
"skipEmptyLines": true,
"columns": null
}
Papa.parse(csvData, PapaConfig)
csvData (subset):
name,phone
De Akker Guest House,0514442010
UTF-8-BOM encoding:
[
{
'name': 'De Akker Guest House',
phone: '0514442010',
UTF-8 encoding:
[
{
'name': 'De Akker Guest House',
phone: '0514442010',
Excel exports csv files to UTF-8-BOM, possibly because that encoding is supposedly faster and more reliable. Can PapaParse be changed to handle UTF-8-BOM correctly?
Issue Analytics
- State:
- Created 3 years ago
- Reactions:7
- Comments:6
I went with this approach, not the most efficient:
Since csvFile is a read stream, not a pre-read file, I just tossed it in there for each step. I could do it only for the 1st step and skip if its anything but the 1st row.
I’m having the same issue in 2022. I was given some external CSV file, probably edited/written on Windows, processing it on Linux with papaparse and I was unable to access the first row property defined by the header. When I
console.log(row.data)
I would see the property key quoted:I edited the original CSV and simply retyped the first character in the head, then reran:
I’m using
const csvFile = fs.createReadStream(csvFilename);
and I tried switching toconst csvFile = fs.readFileSync(csvFilename, { encoding: 'utf-8'});
without luck. I read BOM was supposed to strip with readFileSync but it doesn’t work for me at least: https://github.com/nodejs/node-v0.x-archive/issues/1918