Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

datapackage validate: validate files, tabular schema, etc.

See original GitHub issue

Overview

Right now I see that if I do datapackage validate datapacakge.json it validates (I think, just a quick test): the JSON is correct, the required fields are present, the fields have the correct type / pass the regular expressions, etc.

We expected some more validations: -If there is a resource and the resource has a local path specified: should validate that the file is there -If there is a resource and the resource has a local path with bytes and hash: validate that the file has the correct bytes/hash -If the resource has a remote URL: download it, validate bytes and hash if possible -If the resource is a tabular data: try to “read” it to validate the columns, missing values and other tabular verifications

All of this is easy to do with (we did). But we expected the validate to do it (or to have flags to do it). Some users of Frictionless Data might not be so keen on implementing the checks by themselves and might just want to use the Python CLI to validate.

Please preserve this line to notify @roll (lead of this repository)

Issue Analytics

State:
Created 3 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

rollcommented, May 25, 2020

Hi @cpina,

I’m currently working on a new version of goodtables which will support md5 and other hash algorithms.

Regarding duplicated field names, it’s ok by the specs - https://specs.frictionlessdata.io/table-schema/. It’s the reason why datapackage doesn’t complain. On the other hand, goodtables also tries to force best practices e.g. not having such field names. This check can be skipped with goodtables data/invalid.csv --skip-checks duplicate-header

0reactions

rollcommented, May 25, 2020

I’ll merge it into - https://github.com/frictionlessdata/goodtables-py/issues/341

Top Results From Across the Web

datapackage-py/README.md at main - GitHub

An infer method has found all our files and inspected it to extract useful metadata like profile, encoding, format, Table Schema etc.

datapackage - npm

An infer method has found all our files and inspected it to extract useful metadata like profile, encoding, format, Table Schema etc.

goodtables - PyPI

Goodtables is a framework to validate tabular data. ... Support for multiple tabular formats: CSV, Excel files, LibreOffice, Data Package, etc.

datapackage-py · GitBook

The data package descriptor will be validated with newly added resource descriptor. ... There is a missingValues property in Table Schema specification.

Tabular Data Package - Data Protocols - Open Knowledge ...

Data files in CSV; (Minimal) dataset information in JSON (including ... A valid Tabular Data Package package MUST be a valid Data Package...