Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Encapsulated extensibility

See original GitHub issue

The data package spec says:

"A Data Package author MAY add any number of additional fields beyond those listed in the specification here. "

And the temporal entry example follows.

I’m a bit worried that too much freedom of this kind of extensibility might be contra-productive later. Why?

Namespace is limited
Many authors (of packages) – many meanings
Many tool creators – many varying expectations

Namespace: The authors of the specification are the namespace owners. They have the power over what goes in and what stays out – because they are the standard creators. If the standard has to stay “lean”, then the namespace should contain only those entries that are really necessary.

Author’s keys and Tool’s expectations: as there are many authors of the package, they might put their own metadata under keys they want. Regardless of existence of the same key with different meaning in other packages. The same can be said about tool (visualisation, ETL, mining, …) creators: they might expect certain types of values or structure under a key, but they will receive invalid values, because of some author decided to use that key for something else.

The example with temporal is not very appropriate example of customisation. That piece of metadata might be really useful for ETL tools, but it is not part of the standard! The value can be anything. Or the value can be as stated in the example, but the key might be something else, for example time_range.

Proposal

This is very important for metadata that is going to be used by tools mostly – for automation or application processing.

Strictly guard the namespace and have the known keys/values part of the specification
Have a separate structure for custom metadata, for example custom = { ... }.

Don’t allow authors to put any keys in the top-level. Discourage them to add keys under any other known/specified object.

Recommend authors to put custom keys under that encapsulated customisable structure. You don’t guarantee the contents of that structure – no guarantee for keys neither their values.

The package authors and tool creators can have an agreement to use certain keys in that structure. When you will see that it spreads well, then you can put it into the specification at the top level with well defined contents (for example the most used one).

Larger custom metadata can be even put in a separate .json file. As for smaller structures (mostly objects in lists), such as sources or fields, it might not be forbidden but at least discouraged.

Alternative

If you would like to allow custom top-level keys, they I would suggest to have a wiki (github wiki?) page where package writers and tool creators will write their metadata and expected value.

Use-case

Here is an use-case from another project:

In Cubes we try to minimise the number of keys in the model metadata. Every model object (cube, dimension, attribute, …) has an info dictionary with custom keys and values. Visualization tools sometimes require more metadata or hinting to be able to properly or nicely display the data. The tool writer will add a soft requirement for a custom key – in the info dictionary. It might be cosmetic (colour, formatting, image, …), data related (time range) or metadata related (calendar unit of an attribute).

The cubes-viewer is a visualisation application that connects to the cubes server. It gets the metadata and provides user interface based on the metadata (labels, relationships, concept hierarchies, …). The app has a special case of handling time series. The app recommended for model creators to add cv- keys to the info dictionary. For example, to denote that a field represents a year:

                    "info": { "cv-datefilter-field": "year" }

Cubes had no explicit notion of date/time before. However, as the concept of time is important for the data analysis, roles of dimensions and their levels were introduced. The attribute.info.cv-datefilter-field is now attribute.role (and it might be related to non-time dims as well). It went from single-purpose non-standard to multi-purpose standard. Attribute role is now generated by the server automatically if not specified explicitly by the model author.

Conclusion

Why we need metadata in JSON? To be machine processable. Moreover, to have data machine processable based on the machine processable metadata. That is not the purpose, then we would be fine with just plain README.md.

It might seem to be a bit against evolution of the package format, but it is not. It is just guarded evolution that prevents future incompatibility mess. The separate metadata are just incubated until proven stable enough and standardised enough to be included in the specification.

Issue Analytics

State:
Created 10 years ago
Comments:5 (5 by maintainers)

Top GitHub Comments

1reaction

Stiivicommented, Aug 13, 2016

Best practice is to keep the standard structure strictly non-polluted and to have a designated place/property to store all extensions in without any restrictions. Let it evolve, observe and potentially include in the standards. If others fight over a namespace in the “extensions” space, then it is their problem, not standard’s problem, they should be aware of it and should resolve it. That space is a wild west and it is ok.

Use cases of here proposed extensions metadata are:

data packages I produce within the organisation I work for to consumption of tools within the same organisation to carry internal metadata
metadata that my organisation and some third-party organisation agreed upon sharing

Extensions should be strippable from the data package without affecting usefulness of the data package in the outside world. They might eventually become standard once accepted by significant number of data package producers and consumers.

Re Profiles: Not sure if I understand the proposal and usefulness of it. The name profile is confusing to me – that would be something I’ll keep on my end separate from the data package and would have many-per-user specification of view of data in the referenced data package.

Standard is here for a reason – tools can rely on guaranteed existence of properties and their semantics. Since people don’t reads standards and given that top-level dictionary can include custom keys, then the line between what is standard and what is not is blurred. If I am a tool builder, I should not worry about foreign extensions, but mine. In this case, I don’t know what should I worry about or should just ignore.

0reactions

rufuspollockcommented, Apr 30, 2020

DUPLICATE. I’m closing this in favour of the newer issue that covers the same core ground #663.

Top Results From Across the Web

Encapsulation and Extensibility of Types - Language Basics

Data and behavior are packed together. This encapsulation creates user-defined data types, extending the language data types and interacting with them. Types ...

Encapsulation, Reusability and Extensibility in Object ...

The object-oriented paradigm, first introduced in the language Simula, has been the central design principle of many new programming ...

Encapsulation, Reusability and Extensibility in Object ...

The project concludes that, not only are potential problems compounded by a "semantic overloading" of the diverse inheritance relationships between classes, ...

Extensibility - an overview | ScienceDirect Topics

Changeability and Extensibility—A hierarchy helps encapsulate variation, and hence it is easy to modify existing variations or add support for new ...

Open Modules: Reconciling Extensibility and Information Hiding

ABSTRACT. Aspect-oriented programming systems provide powerful mechanisms for separating concerns, but understanding how these concerns interact can be ...