question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Encapsulated extensibility

See original GitHub issue

The data package spec says:

"A Data Package author MAY add any number of additional fields beyond those listed in the specification here. "

And the temporal entry example follows.

I’m a bit worried that too much freedom of this kind of extensibility might be contra-productive later. Why?

  1. Namespace is limited
  2. Many authors (of packages) – many meanings
  3. Many tool creators – many varying expectations

Namespace: The authors of the specification are the namespace owners. They have the power over what goes in and what stays out – because they are the standard creators. If the standard has to stay “lean”, then the namespace should contain only those entries that are really necessary.

Author’s keys and Tool’s expectations: as there are many authors of the package, they might put their own metadata under keys they want. Regardless of existence of the same key with different meaning in other packages. The same can be said about tool (visualisation, ETL, mining, …) creators: they might expect certain types of values or structure under a key, but they will receive invalid values, because of some author decided to use that key for something else.

The example with temporal is not very appropriate example of customisation. That piece of metadata might be really useful for ETL tools, but it is not part of the standard! The value can be anything. Or the value can be as stated in the example, but the key might be something else, for example time_range.

Proposal

This is very important for metadata that is going to be used by tools mostly – for automation or application processing.

  1. Strictly guard the namespace and have the known keys/values part of the specification
  2. Have a separate structure for custom metadata, for example custom = { ... }.

Don’t allow authors to put any keys in the top-level. Discourage them to add keys under any other known/specified object.

Recommend authors to put custom keys under that encapsulated customisable structure. You don’t guarantee the contents of that structure – no guarantee for keys neither their values.

The package authors and tool creators can have an agreement to use certain keys in that structure. When you will see that it spreads well, then you can put it into the specification at the top level with well defined contents (for example the most used one).

Larger custom metadata can be even put in a separate .json file. As for smaller structures (mostly objects in lists), such as sources or fields, it might not be forbidden but at least discouraged.

Alternative

If you would like to allow custom top-level keys, they I would suggest to have a wiki (github wiki?) page where package writers and tool creators will write their metadata and expected value.

Use-case

Here is an use-case from another project:

In Cubes we try to minimise the number of keys in the model metadata. Every model object (cube, dimension, attribute, …) has an info dictionary with custom keys and values. Visualization tools sometimes require more metadata or hinting to be able to properly or nicely display the data. The tool writer will add a soft requirement for a custom key – in the info dictionary. It might be cosmetic (colour, formatting, image, …), data related (time range) or metadata related (calendar unit of an attribute).

The cubes-viewer is a visualisation application that connects to the cubes server. It gets the metadata and provides user interface based on the metadata (labels, relationships, concept hierarchies, …). The app has a special case of handling time series. The app recommended for model creators to add cv- keys to the info dictionary. For example, to denote that a field represents a year:

                    "info": { "cv-datefilter-field": "year" }

Cubes had no explicit notion of date/time before. However, as the concept of time is important for the data analysis, roles of dimensions and their levels were introduced. The attribute.info.cv-datefilter-field is now attribute.role (and it might be related to non-time dims as well). It went from single-purpose non-standard to multi-purpose standard. Attribute role is now generated by the server automatically if not specified explicitly by the model author.

Conclusion

Why we need metadata in JSON? To be machine processable. Moreover, to have data machine processable based on the machine processable metadata. That is not the purpose, then we would be fine with just plain README.md.

It might seem to be a bit against evolution of the package format, but it is not. It is just guarded evolution that prevents future incompatibility mess. The separate metadata are just incubated until proven stable enough and standardised enough to be included in the specification.

Issue Analytics

  • State:closed
  • Created 10 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
Stiivicommented, Aug 13, 2016

Best practice is to keep the standard structure strictly non-polluted and to have a designated place/property to store all extensions in without any restrictions. Let it evolve, observe and potentially include in the standards. If others fight over a namespace in the “extensions” space, then it is their problem, not standard’s problem, they should be aware of it and should resolve it. That space is a wild west and it is ok.

Use cases of here proposed extensions metadata are:

  1. data packages I produce within the organisation I work for to consumption of tools within the same organisation to carry internal metadata
  2. metadata that my organisation and some third-party organisation agreed upon sharing

Extensions should be strippable from the data package without affecting usefulness of the data package in the outside world. They might eventually become standard once accepted by significant number of data package producers and consumers.

Re Profiles: Not sure if I understand the proposal and usefulness of it. The name profile is confusing to me – that would be something I’ll keep on my end separate from the data package and would have many-per-user specification of view of data in the referenced data package.

Standard is here for a reason – tools can rely on guaranteed existence of properties and their semantics. Since people don’t reads standards and given that top-level dictionary can include custom keys, then the line between what is standard and what is not is blurred. If I am a tool builder, I should not worry about foreign extensions, but mine. In this case, I don’t know what should I worry about or should just ignore.

0reactions
rufuspollockcommented, Apr 30, 2020

DUPLICATE. I’m closing this in favour of the newer issue that covers the same core ground #663.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Encapsulation and Extensibility of Types - Language Basics
Data and behavior are packed together. This encapsulation creates user-defined data types, extending the language data types and interacting with them. Types ...
Read more >
Encapsulation, Reusability and Extensibility in Object ...
The object-oriented paradigm, first introduced in the language Simula, has been the central design principle of many new programming ...
Read more >
Encapsulation, Reusability and Extensibility in Object ...
The project concludes that, not only are potential problems compounded by a "semantic overloading" of the diverse inheritance relationships between classes, ...
Read more >
Extensibility - an overview | ScienceDirect Topics
Changeability and Extensibility—A hierarchy helps encapsulate variation, and hence it is easy to modify existing variations or add support for new ...
Read more >
Open Modules: Reconciling Extensibility and Information Hiding
ABSTRACT. Aspect-oriented programming systems provide powerful mechanisms for separating concerns, but understanding how these concerns interact can be ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found