datapackage validate: validate files, tabular schema, etc.
See original GitHub issueOverview
Right now I see that if I do datapackage validate datapacakge.json
it validates (I think, just a quick test): the JSON is correct, the required fields are present, the fields have the correct type / pass the regular expressions, etc.
We expected some more validations: -If there is a resource and the resource has a local path specified: should validate that the file is there -If there is a resource and the resource has a local path with bytes and hash: validate that the file has the correct bytes/hash -If the resource has a remote URL: download it, validate bytes and hash if possible -If the resource is a tabular data: try to “read” it to validate the columns, missing values and other tabular verifications
All of this is easy to do with (we did). But we expected the validate
to do it (or to have flags to do it). Some users of Frictionless Data might not be so keen on implementing the checks by themselves and might just want to use the Python CLI to validate.
Please preserve this line to notify @roll (lead of this repository)
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
Hi @cpina,
I’m currently working on a new version of
goodtables
which will support md5 and other hash algorithms.Regarding duplicated field names, it’s ok by the specs - https://specs.frictionlessdata.io/table-schema/. It’s the reason why
datapackage
doesn’t complain. On the other hand, goodtables also tries to force best practices e.g. not having such field names. This check can be skipped withgoodtables data/invalid.csv --skip-checks duplicate-header
I’ll merge it into - https://github.com/frictionlessdata/goodtables-py/issues/341