question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Should "datapackage-py" change the descriptors it received?

See original GitHub issue

Consider the following example:

descriptor = {
    'name': 'my-datapackage',
    'resources': [
        {
            'name': 'main',
            'profile': 'tabular-data-resource',
            'data': [
                {'value': 10},
                {'value': 20},
                {'value': 30},
            ]
        }
    ]
}
dp = Package(descriptor)
assert dp.descriptor == descriptor

As a user, I’d expect the assertion to pass. However, it doesn’t, because datapackage-py modifies the descriptor it received. In this case, it adds a descriptor['profile'] = 'data-package', but there were a few other cases in my tests. For example, it automatically adds 'missingValues': [''] to the schema, 'encoding': 'utf-8' to the resources, and 'format': 'default' to the resources schema’s fields.

This complicates testing that the descriptor is what you expect it to be, as you have to take into account not only what you added yourself, but also what datapackage-py (which might change, causing your tests to break).

As a general rule, I think it’s better not to change the user’s data unless necessary. Most (all) of the examples I found were caused because datapackage-py explicitly added default values to the descriptors. I understand why this would make the datapackage-py code simpler, but then it makes my tests (as a library user) more complicated.

WDYT?

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
rollcommented, Sep 18, 2017

A brief history of this one that after we’ve started specs-v1 update there was a specs requirements to apply defaults from JSON schemas - https://github.com/frictionlessdata/implementations/issues/4. We had never discussed on which level this requirement exists (like specs requires implementations to updated descriptor or something else etc). So on implementations level we’ve just added descriptor mutation.

So I’m +1 on not mutating descriptor and use defaults on-demand internally. For now it’s clear for me that descriptor mutation can’t be a spec requirement (it’s too low-level thing so I’ve just got it wrong). We’ve already ruled out for implementations normalizing things like PK/FK (string/array) on descriptor level because it could break client code. So not mutating descriptor at all makes sense also for me.

0reactions
rollcommented, Sep 26, 2020

Frictionless Framework is designed that a user can only explicitly change metadata. This problem is resolved there

Read more comments on GitHub >

github_iconTop Results From Across the Web

datapackage - PyPI
As a first try we set missingValues to N/A in resource.descriptor.schema . Resource descriptor could be changed in-place but all changes should be...
Read more >
Package List — Spack 0.20.0.dev0 documentation
This is a list of things you can install using Spack. ... gsl; Description: The chgcentre tool can be used to change the...
Read more >
My PIP decision maker said he had changed the descriptors ...
Hi just wondering if someone could advise me, my son has ADHD and was on high care and low mobility DLA when he...
Read more >
PIP - table of activities, descriptors and points - Citizens Advice
Descriptors. Points a. Either –. (i) does not receive medication or therapy or need to monitor a health condition; or. (ii) can manage...
Read more >
PIP assessment guide part 1: the assessment process - GOV.UK
The HP will choose a descriptor for each activity and a DWP CM will ... has been a change in the claimant's health...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found