Make resources[].name and fields[].name unique
See original GitHub issueOverview
For now we have pretty symmetric:
- datapackage.resources[].name (required)
- tableschema.fields[].name (required)
I propose to make it even more stricter - allow only unique names. It will simplify implementations in some aspects (esp. related to SQL/etc import/export where resource name maps to unique table name). It will allow like simple key based access to resources/fields. Also it should simplify overall feel of specs for publishers.
Also I’ve updated my opinion on a problem mentioned in #280 - there is a lot of csv with duplicated headers so we should support duplicate field names
. If we will make field name unique it will not break anything because column-field mapping based on column/field order not names (or not? it should be clarified in the spec). So if you have csv with two price
headers just use price
field name for first and additional_price
for second. Of course goodtables
will mark it as an error (and the same for duplicate header) but csv with duplicated headers are invalid by definition anyway.
It’s following:
Issue Analytics
- State:
- Created 7 years ago
- Reactions:1
- Comments:7 (5 by maintainers)
Top GitHub Comments
+1 to @roll point that order will always matter so we don’t want to move to objects from arrays for fields (or for resources).
I’m somewhat agnostic on the change - I understand the benefits tools-wise (because you can got for keys from names) but fear it may constrain some publication scenarios (I’ve seen a lot of bad CSVs in my time). Overall I’m +1 on field.name being unique because you can handle duplicate column names other ways (just add incremental index to column name when making the field name and we already constraint field name i think to have certain characteristics). I’d also be +1 on unique names for resources for similar reasons …
@CharlesNepote your case of the same file split by year is better addressed by the “chunks” concept see #228
@pwalsh I don’t think we could drop ordered structures especially for
fields
. Tables (the most important case is csv tables) have ordered columns by nature so I think that was a good choice to use ordered non-keyed structure to describe it. And for tables without headers ordered fields is only way to map columns to fields. And now after v1 release this change I think don’t make sense at all - too less pros vs another round of changing too much stuff (for example current ordered nature is vital for goodtables etc). I suppose we shouldn’t even consider it (list -> dict) now.So my focus here only on additional constraint to existent ordered
fields
andresources
structures which could be useful without changing data structures. As said the main beneficiary of unique resource names could be SQL import/export and for fields we could simplify keyed access on implementation level for example (preserving ordered list as data structure for fields).@CharlesNepote Interesting cases. I don’t have enough experience in data publishing so just curious is it intended way of specs usage or more like temporal solution while there is no proper mechanisms for example to have ONE resource with mirrored data sources (csv and excel). cc @pwalsh @rufuspollock