Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Should "datapackage-py" change the descriptors it received?

See original GitHub issue

Consider the following example:

descriptor = {
    'name': 'my-datapackage',
    'resources': [
        {
            'name': 'main',
            'profile': 'tabular-data-resource',
            'data': [
                {'value': 10},
                {'value': 20},
                {'value': 30},
            ]
        }
    ]
}
dp = Package(descriptor)
assert dp.descriptor == descriptor

As a user, I’d expect the assertion to pass. However, it doesn’t, because datapackage-py modifies the descriptor it received. In this case, it adds a descriptor['profile'] = 'data-package', but there were a few other cases in my tests. For example, it automatically adds 'missingValues': [''] to the schema, 'encoding': 'utf-8' to the resources, and 'format': 'default' to the resources schema’s fields.

This complicates testing that the descriptor is what you expect it to be, as you have to take into account not only what you added yourself, but also what datapackage-py (which might change, causing your tests to break).

As a general rule, I think it’s better not to change the user’s data unless necessary. Most (all) of the examples I found were caused because datapackage-py explicitly added default values to the descriptors. I understand why this would make the datapackage-py code simpler, but then it makes my tests (as a library user) more complicated.

WDYT?

Issue Analytics

State:
Created 6 years ago
Comments:5 (5 by maintainers)

Top GitHub Comments

1reaction

rollcommented, Sep 18, 2017

A brief history of this one that after we’ve started specs-v1 update there was a specs requirements to apply defaults from JSON schemas - https://github.com/frictionlessdata/implementations/issues/4. We had never discussed on which level this requirement exists (like specs requires implementations to updated descriptor or something else etc). So on implementations level we’ve just added descriptor mutation.

So I’m +1 on not mutating descriptor and use defaults on-demand internally. For now it’s clear for me that descriptor mutation can’t be a spec requirement (it’s too low-level thing so I’ve just got it wrong). We’ve already ruled out for implementations normalizing things like PK/FK (string/array) on descriptor level because it could break client code. So not mutating descriptor at all makes sense also for me.

0reactions

rollcommented, Sep 26, 2020

Frictionless Framework is designed that a user can only explicitly change metadata. This problem is resolved there