question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

datapackage validate: validate files, tabular schema, etc.

See original GitHub issue

Overview

Right now I see that if I do datapackage validate datapacakge.json it validates (I think, just a quick test): the JSON is correct, the required fields are present, the fields have the correct type / pass the regular expressions, etc.

We expected some more validations: -If there is a resource and the resource has a local path specified: should validate that the file is there -If there is a resource and the resource has a local path with bytes and hash: validate that the file has the correct bytes/hash -If the resource has a remote URL: download it, validate bytes and hash if possible -If the resource is a tabular data: try to “read” it to validate the columns, missing values and other tabular verifications

All of this is easy to do with (we did). But we expected the validate to do it (or to have flags to do it). Some users of Frictionless Data might not be so keen on implementing the checks by themselves and might just want to use the Python CLI to validate.


Please preserve this line to notify @roll (lead of this repository)

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
rollcommented, May 25, 2020

Hi @cpina,

I’m currently working on a new version of goodtables which will support md5 and other hash algorithms.

Regarding duplicated field names, it’s ok by the specs - https://specs.frictionlessdata.io/table-schema/. It’s the reason why datapackage doesn’t complain. On the other hand, goodtables also tries to force best practices e.g. not having such field names. This check can be skipped with goodtables data/invalid.csv --skip-checks duplicate-header

0reactions
rollcommented, May 25, 2020
Read more comments on GitHub >

github_iconTop Results From Across the Web

datapackage-py/README.md at main - GitHub
An infer method has found all our files and inspected it to extract useful metadata like profile, encoding, format, Table Schema etc.
Read more >
datapackage - npm
An infer method has found all our files and inspected it to extract useful metadata like profile, encoding, format, Table Schema etc.
Read more >
goodtables - PyPI
Goodtables is a framework to validate tabular data. ... Support for multiple tabular formats: CSV, Excel files, LibreOffice, Data Package, etc.
Read more >
datapackage-py · GitBook
The data package descriptor will be validated with newly added resource descriptor. ... There is a missingValues property in Table Schema specification.
Read more >
Tabular Data Package - Data Protocols - Open Knowledge ...
Data files in CSV; (Minimal) dataset information in JSON (including ... A valid Tabular Data Package package MUST be a valid Data Package...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found