question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Merging path into data

See original GitHub issue

We’ve already merged url into path: path is now a URI, being a file on a filesystem or a resource over http.

In data packages, we have a general concept where a thing is either inlined or referenced. This pattern exists for schema, licenses and other properties: the payload for the property can be directly inlined, or, pointed to with a file path or an http address.

The only property that deviates from this pattern across all the specifications is that which handles the very thing we are packaging: path is reserved for references to data, and the additional data property is reserved for the inlining of data.

Implementations need to handle this reference to/inlining of data anyway, so what valid reasons do we have to maintain the path/data distinction?

I propose we merge path into data, or, we do a better job of being explicit about why we need two properties, and what the specific behaviour is with them, especially if they both appear in a package.

Pros

  • Consistent with the inlining/referencing pattern of other properties
  • Don’t have 2 properties for the same concept (and therefore, don’t require awkward wording in the spec, and handling in code, of path or data, but not both, which one takes precedence, etc.)

Cons

  • Some types of data will be strings. So, because we don’t have a scheme for our file paths when path is a file path, it could be hard to implement code that handles both file paths and inlined data (but honestly, not that hard).
  • Gets complicated with path as array - how does an implementation distinguish between an array of URIs, and an array which is inlined data?

Solution

  • use package:// as the scheme for file paths. We don’t want to use file:// as we are explicitly declaring that data sources must be nested below the location of the descriptor, so the semantics here are that package:// refers to that location as the root. This enables implementations an explicit hint on the type of URI, and, if the value of data is not a URI.
    • Has the added benefit of more clarity on our forbidding of absolute and relative parent paths
    • this works fine for simple scenarios. it does not work well for the case of path as array. An implementor can’t inspect the string to check the type of data. A workaround would be to check the first string in the array, but it feels like a great big hack by then.
      • still, if we are serious about this wording: “all files in the array MUST be similar in terms of structure, format etc. Implementors SHOULD be able simply concatenate the files together and treat the result as one large file.” then implementors will have to do some inspection on each string in the array anyway.

Closing

This all may be just too complex for implementations, but I think it is clearly better for publishers.

I still want to open this issue to record the info for future reference if needed, even if we probably will not try to “solve” this.

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:10 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
rollcommented, Dec 13, 2016

Goodtables, jsontableschema.Table, tabulator etc use source (aka data) and various schemes to distinct between data types. Spliting on path and data on spec level complicates things for implementations not otherwise.

0reactions
pwalshcommented, Feb 9, 2017

FIXED In #337

Read more comments on GitHub >

github_iconTop Results From Across the Web

Merge Path - A Visually Intuitive Approach to Parallel Merging
We present a novel, visually intuitive approach to partitioning two input sorted arrays into pairs of contiguous sequences of elements, one from each...
Read more >
Merge Path - Parallel Merging Made Simple
We present a novel approach to partitioning the two sorted arrays into pairs of contiguous sequences of elements, one from each array, such...
Read more >
Is there a way to merge paths back together?
Does anyone know how (or if it is possible) to rejoin paths? ... adding the data from both paths to a Google Sheet...
Read more >
(PDF) Merge Path - Parallel Merging Made Simple
We present a novel approach to partitioning the two sorted arrays into pairs of contiguous sequences of elements, one from each array, such ......
Read more >
Data management: How to merge files into a single dataset
This video demonstrates how to merge files into a single dataset. Copyright 2011-2019 StataCorp LLC. All rights reserved.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found