CSV validation: is it RFC 4180 compliant?
See original GitHub issueHi, if I run
frictionless validate "https://gist.githubusercontent.com/aborruso/3b675b529da264be83c7e75e26d2ee26/raw/e18c3cbbb5217acc430c950c3f5ae622eb52e661/tmp.csv"
I have “valid” result.
But in RFC 4180 I have “Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes.”
In the input I have the cell LISTA CIVICA | "SAINT-OYEN - TRAVAILLI EUNSEMBLO"
and this is not a valid RFC 4180 cell.
It should be "LISTA CIVICA | ""SAINT-OYEN - TRAVAILLI EUNSEMBLO"""
.
Then the question is: should frictionless validation be RFC 4180 compliant? If yes, this seems to me a bug.
Thank you
Issue Analytics
- State:
- Created a year ago
- Comments:11 (11 by maintainers)
Top Results From Across the Web
CSV, Comma Separated Values (RFC 4180)
It is a delimited data format that has fields/columns separated by the comma character %x2C (Hex 2C) and records/rows/lines separated by ...
Read more >RFC 4180: Common Format and MIME Type for Comma ...
RFC 4180 Common Format and MIME Type for CSV Files October 2005 1. Introduction The comma separated values format (CSV) has been used...
Read more >RFC 4180 - Common Format and MIME Type for Comma ...
Abstract This RFC documents the format used for Comma-Separated Values (CSV) files and registers the associated MIME type "text/csv". Table of Contents 1....
Read more >Verify a CSV file complies with the RFC 4180 ... - Stack Overflow
I have a CSV file and I want to write a Java program to check whether it is RFC 4180 compliant or not....
Read more >How To Do CSV File Validation And Schema Generation
We recently release a public API to validate CSV files for compliance with established norms such as RFC4180. Think of the API as...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Ok, I close it. Consider for yourself whether to include this validation exception in the documentation.
And thank you very much
I agree with what @shashigharti said above, better to wait to see if a strict mode is implemented in the standard csv library or look for another library.
There is the library https://github.com/alan-turing-institute/CleverCSV that is a drop in substitute to the standard csv library and it can convert a csv to a RFC 4180 compliant one, but it also don’t raise an exception when a row is not compliant to the RFC. The library says that it can detect the dialect better with messy csv files:
The only way that I can see to detect that the csv is no compliant to the RFC 4180 would be to change the standard csv library or the clevercsv (in that case it would need to be done in the C parser https://github.com/alan-turing-institute/CleverCSV/blob/master/src/cparser.c). Other idea would be to just convert the csv to a standard compliant one using clevercsv and comparing the different csv files, but this would not be recommended for big csv files.