question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CSV validation: is it RFC 4180 compliant?

See original GitHub issue

Hi, if I run

frictionless validate "https://gist.githubusercontent.com/aborruso/3b675b529da264be83c7e75e26d2ee26/raw/e18c3cbbb5217acc430c950c3f5ae622eb52e661/tmp.csv"

I have “valid” result.

But in RFC 4180 I have “Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes.”

In the input I have the cell LISTA CIVICA | "SAINT-OYEN - TRAVAILLI EUNSEMBLO" and this is not a valid RFC 4180 cell. It should be "LISTA CIVICA | ""SAINT-OYEN - TRAVAILLI EUNSEMBLO""".

Then the question is: should frictionless validation be RFC 4180 compliant? If yes, this seems to me a bug.

Thank you

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:11 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
aborrusocommented, Aug 30, 2022

Ok, I close it. Consider for yourself whether to include this validation exception in the documentation.

And thank you very much

1reaction
aivukcommented, Aug 30, 2022

I agree with what @shashigharti said above, better to wait to see if a strict mode is implemented in the standard csv library or look for another library.

There is the library https://github.com/alan-turing-institute/CleverCSV that is a drop in substitute to the standard csv library and it can convert a csv to a RFC 4180 compliant one, but it also don’t raise an exception when a row is not compliant to the RFC. The library says that it can detect the dialect better with messy csv files:

With our method we achieve 97% accuracy for dialect detection, with a 21% improvement on non-standard (messy) CSV files compared to the Python standard library.

The only way that I can see to detect that the csv is no compliant to the RFC 4180 would be to change the standard csv library or the clevercsv (in that case it would need to be done in the C parser https://github.com/alan-turing-institute/CleverCSV/blob/master/src/cparser.c). Other idea would be to just convert the csv to a standard compliant one using clevercsv and comparing the different csv files, but this would not be recommended for big csv files.

Read more comments on GitHub >

github_iconTop Results From Across the Web

CSV, Comma Separated Values (RFC 4180)
It is a delimited data format that has fields/columns separated by the comma character %x2C (Hex 2C) and records/rows/lines separated by ...
Read more >
RFC 4180: Common Format and MIME Type for Comma ...
RFC 4180 Common Format and MIME Type for CSV Files October 2005 1. Introduction The comma separated values format (CSV) has been used...
Read more >
RFC 4180 - Common Format and MIME Type for Comma ...
Abstract This RFC documents the format used for Comma-Separated Values (CSV) files and registers the associated MIME type "text/csv". Table of Contents 1....
Read more >
Verify a CSV file complies with the RFC 4180 ... - Stack Overflow
I have a CSV file and I want to write a Java program to check whether it is RFC 4180 compliant or not....
Read more >
How To Do CSV File Validation And Schema Generation
We recently release a public API to validate CSV files for compliance with established norms such as RFC4180. Think of the API as...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found