Hash is not recalculated when writing data package metadata
See original GitHub issueOverview
When you write the data package metadata with to_yaml
or to_json
, the resource hashes are not recalculated. This leads to validation errors down the line.
How to reproduce
On a clean environment, install Frictionless 3.14.0:
(env) $ pip install ipython frictionless==3.14.0
Open a Python (or IPython) terminal, then:
In [1]: from frictionless import describe_package, validate
In [2]: from pprint import pprint
In [3]: csv = 'a,b\n0,1'
In [4]: with open('test.csv', 'w') as f:
...: f.write(csv)
...:
In [5]: package = describe_package('test.csv')
In [6]: resource = package.get_resource('test')
In [7]: resource.hashing = 'sha256'
In [8]: package.to_json('test.json')
In [9]: report = validate('test.json', source_type='package')
In [10]: pprint(report)
{'errors': [],
'stats': {'errors': 1, 'tables': 1},
'tables': [{'compression': 'no',
'compressionPath': '',
'dialect': {},
'encoding': 'utf-8',
'errors': [{'code': 'checksum-error',
'description': 'This error can happen if the data is '
'corrupted.',
'message': 'The data source does not match the '
'expected checksum: expected hash in '
'sha256 is '
'"a316a7ac2a0f3a69719cb532b31a6788" and '
'actual is '
'"14d6e4164bb209ee74f10b8182da85f913a636c233690ebd80cc8aa4cbc53491"',
'name': 'Checksum Error',
'note': 'expected hash in sha256 is '
'"a316a7ac2a0f3a69719cb532b31a6788" and '
'actual is '
'"14d6e4164bb209ee74f10b8182da85f913a636c233690ebd80cc8aa4cbc53491"',
'tags': ['#table', '#checksum']}],
'format': 'csv',
'hashing': 'sha256',
'header': ['a', 'b'],
'partial': False,
'path': 'test.csv',
'query': {},
'schema': {'fields': [{'name': 'a', 'type': 'integer'},
{'name': 'b', 'type': 'integer'}]},
'scheme': 'file',
'scope': ['dialect-error',
'schema-error',
'field-error',
'extra-header',
'missing-header',
'blank-header',
'duplicate-header',
'non-matching-header',
'extra-cell',
'missing-cell',
'blank-row',
'type-error',
'constraint-error',
'unique-error',
'primary-key-error',
'foreign-key-error',
'checksum-error'],
'stats': {'bytes': 7,
'errors': 1,
'fields': 2,
'hash': '14d6e4164bb209ee74f10b8182da85f913a636c233690ebd80cc8aa4cbc53491',
'rows': 1},
'time': 0.005,
'valid': False}],
'time': 0.019,
'valid': False,
'version': '3.14.0'}
Expected behavior
I believe I should be able to choose the hashing type when describing a data package. Changing the hashing type should mean the hash gets recalculated with the new hashing algorithm.
The generated data package should be validated.
Actual behavior
We get a validation error because of the differing checksum.
Please preserve this line to notify @roll (lead of this repository)
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
Poetry refuses to install package with correct hash · Issue #4523
I am on the latest Poetry version. I have searched the issues of this repo and believe that this is not a duplicate....
Read more >python - Compute hash of only the core image data (excluding ...
Trying to efficiently create a hash of an image that does not change when the EXIF data is edited. (ImageMagick has a visual...
Read more >Hash error messages - Code42 Support
Overview. When a file in a cloud service is updated, moved, or shared, Code42 calculates the hash value for the file.
Read more >Why Hash Values Are Crucial in Evidence Collection & Digital ...
When it comes to authenticating digital evidence, the use of hash values is absolutely crucial. Read this blog post to understand why.
Read more >Does an identical cryptographic hash or checksum for two files ...
For your purposes, yes, identical hashes means identical files. As other answers make clear, it's possible to construct 2 different files which ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Awesome! Looking forward to trying it out soon!
Hi @augusto-herrmann,
It’s a great idea I’m releasing
frictionless@3.18
with this argument available fordescribe/describe_package
(it’s was implemented only fordescribe_resource
)No, it will not change existent properties except for recalculation of
resource.stats