question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Release curated data on the Web

See original GitHub issue

#269 made it apparent that keeping NPM-only the consistency-guarantees provided by the NPM packages had two limitations:

  • it forces to go through NPM which is at best awkward for non-NPM based consumers
  • it limits the impact of the curation to whatever is packaged (e.g. the raw IDL files for the WebIDL package) - there is no easy way to access the generated JSON files that derive from it

An idea @tidoust and I discussed was to use w3c.github.io/webref to publish the curated-based view of the data.

This would require:

  • moving the anomaly report out of webref (which we have discussed doing for quite some time in any case)
  • adopting the release workflows to make publication on the gh-pages branch another of their outcomes

One question that needs more thinking is how to manage versioning.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:4
  • Comments:6 (1 by maintainers)

github_iconTop GitHub Comments

2reactions
tidoustcommented, Jan 31, 2022

I’m closing the issue as the curation job is now in place and seems to work. I wouldn’t be too surprised if additional tweaks turn out to be needed but let’s handle them separately.

The @webref/idl@latest ref will point to the right commit on the curated branch as soon as a new version of the @webref/idl package gets released (currently blocked by non-distinguishable IDL in Autoplay Policy Detection)

Same thing for the @webref/css@latest ref (there was a remaining bug in the release script when the @webref/css@3.0.4 was released, so tags could not be added).

The @webref/elements@latest ref… does not exist yet, @dontcallmedom to create it and have it point to @webref/elements@1.0.4.

1reaction
tidoustcommented, Jan 7, 2022

Jotting down notes for discussion on a possible plan.

Suggested plan

In short:

  • The main branch continues to hold the raw data and all the code (same as today)
  • A new curated branch get created to contain the curated data. That new branch is published under https://w3c.github.io/webref/. Curation means applying patches to the raw data and re-generating the idlnames, idlnamesparsed, and idlparsed folders.

The curated branch would actually contain two curated views of the data:

  1. an ed view that contains data for all the specs crawled under the ed folder
  2. a browser view that only contains data for specs identified as browser specs (all specs in Webref are browser specs for now, but the goal is to relax that requirement soonish e.g. to extend the xref database to other types of specs).

We cannot maintain only one view for both situations because there is no easy way to filter out specs from the idlnames and idlnamesparsed folders. Data will be duplicated across views as a result but so be it. It does not seem useful to maintain a view for the tr crawl at this stage, although that could be considered later on.

NPM packages will be released from the data in the curated branch. Existing NPM packages will typically be released from the curated data in the browser view. When an NPM package gets released, a new version tag is created to the corresponding commit on the curated branch. I propose not to introduce a global curated data version for now.

To avoid “growing” the size of the repo, an alternative approach would be to create one separate repo for the curated data, and another one for the browser-only view of the curated data. If some projects are planning to clone the repos, this would allow them to only get the data they need. Is that needed?

Anything else?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Open data, open curation | Scientific Data - Nature
The most tangible output of the curation process is the machine-accessible metadata record that forms a part of each Data Descriptor (in ISA-Tab ......
Read more >
Data Curation Best Practices - Brown University Library
Researchers commonly place data on their website, which commonly has broken links. Instead, get a Digital Object Identifier (DOI) for citing your data...
Read more >
What is Data Curation? - TechTarget
Data curation is the process of creating, organizing and maintaining data sets so they can be accessed and used by people looking for...
Read more >
Data curation - Wikipedia
Data curation is the organization and integration of data collected from various sources. It involves annotation, publication and presentation of the data ......
Read more >
How do properties of data, their curation, and their funding ...
In Level 1, curators create a study website with descriptive metadata, a PDF codebook that explains what each variable represents, and data ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found