Should "datapackage-py" change the descriptors it received?
See original GitHub issueConsider the following example:
descriptor = {
'name': 'my-datapackage',
'resources': [
{
'name': 'main',
'profile': 'tabular-data-resource',
'data': [
{'value': 10},
{'value': 20},
{'value': 30},
]
}
]
}
dp = Package(descriptor)
assert dp.descriptor == descriptor
As a user, I’d expect the assertion to pass. However, it doesn’t, because datapackage-py
modifies the descriptor it received. In this case, it adds a descriptor['profile'] = 'data-package'
, but there were a few other cases in my tests. For example, it automatically adds 'missingValues': ['']
to the schema, 'encoding': 'utf-8'
to the resources, and 'format': 'default'
to the resources schema’s fields.
This complicates testing that the descriptor is what you expect it to be, as you have to take into account not only what you added yourself, but also what datapackage-py
(which might change, causing your tests to break).
As a general rule, I think it’s better not to change the user’s data unless necessary. Most (all) of the examples I found were caused because datapackage-py
explicitly added default values to the descriptors. I understand why this would make the datapackage-py
code simpler, but then it makes my tests (as a library user) more complicated.
WDYT?
Issue Analytics
- State:
- Created 6 years ago
- Comments:5 (5 by maintainers)
Top GitHub Comments
A brief history of this one that after we’ve started specs-v1 update there was a specs requirements to apply defaults from JSON schemas - https://github.com/frictionlessdata/implementations/issues/4. We had never discussed on which level this requirement exists (like specs requires implementations to updated descriptor or something else etc). So on implementations level we’ve just added descriptor mutation.
So I’m +1 on not mutating descriptor and use defaults on-demand internally. For now it’s clear for me that descriptor mutation can’t be a spec requirement (it’s too low-level thing so I’ve just got it wrong). We’ve already ruled out for implementations normalizing things like PK/FK (string/array) on descriptor level because it could break client code. So not mutating descriptor at all makes sense also for me.
Frictionless Framework is designed that a user can only explicitly change metadata. This problem is resolved there