Release curated data on the Web
See original GitHub issue#269 made it apparent that keeping NPM-only the consistency-guarantees provided by the NPM packages had two limitations:
- it forces to go through NPM which is at best awkward for non-NPM based consumers
- it limits the impact of the curation to whatever is packaged (e.g. the raw IDL files for the WebIDL package) - there is no easy way to access the generated JSON files that derive from it
An idea @tidoust and I discussed was to use w3c.github.io/webref to publish the curated-based view of the data.
This would require:
- moving the anomaly report out of webref (which we have discussed doing for quite some time in any case)
- adopting the release workflows to make publication on the
gh-pagesbranch another of their outcomes
One question that needs more thinking is how to manage versioning.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:4
- Comments:6 (1 by maintainers)
Top Results From Across the Web
Open data, open curation | Scientific Data - Nature
The most tangible output of the curation process is the machine-accessible metadata record that forms a part of each Data Descriptor (in ISA-Tab ......
Read more >Data Curation Best Practices - Brown University Library
Researchers commonly place data on their website, which commonly has broken links. Instead, get a Digital Object Identifier (DOI) for citing your data...
Read more >What is Data Curation? - TechTarget
Data curation is the process of creating, organizing and maintaining data sets so they can be accessed and used by people looking for...
Read more >Data curation - Wikipedia
Data curation is the organization and integration of data collected from various sources. It involves annotation, publication and presentation of the data ......
Read more >How do properties of data, their curation, and their funding ...
In Level 1, curators create a study website with descriptive metadata, a PDF codebook that explains what each variable represents, and data ......
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

I’m closing the issue as the curation job is now in place and seems to work. I wouldn’t be too surprised if additional tweaks turn out to be needed but let’s handle them separately.
The
@webref/idl@latestref will point to the right commit on thecuratedbranch as soon as a new version of the@webref/idlpackage gets released (currently blocked by non-distinguishable IDL in Autoplay Policy Detection)Same thing for the
@webref/css@latestref (there was a remaining bug in the release script when the@webref/css@3.0.4was released, so tags could not be added).The
@webref/elements@latestref… does not exist yet, @dontcallmedom to create it and have it point to@webref/elements@1.0.4.Jotting down notes for discussion on a possible plan.
Suggested plan
In short:
mainbranch continues to hold the raw data and all the code (same as today)curatedbranch get created to contain the curated data. That new branch is published underhttps://w3c.github.io/webref/. Curation means applying patches to the raw data and re-generating theidlnames,idlnamesparsed, andidlparsedfolders.The
curatedbranch would actually contain two curated views of the data:edview that contains data for all the specs crawled under theedfolderbrowserview that only contains data for specs identified as browser specs (all specs in Webref are browser specs for now, but the goal is to relax that requirement soonish e.g. to extend the xref database to other types of specs).We cannot maintain only one view for both situations because there is no easy way to filter out specs from the
idlnamesandidlnamesparsedfolders. Data will be duplicated across views as a result but so be it. It does not seem useful to maintain a view for thetrcrawl at this stage, although that could be considered later on.NPM packages will be released from the data in the
curatedbranch. Existing NPM packages will typically be released from the curated data in thebrowserview. When an NPM package gets released, a new version tag is created to the corresponding commit on thecuratedbranch. I propose not to introduce a global curated data version for now.To avoid “growing” the size of the repo, an alternative approach would be to create one separate repo for the curated data, and another one for the browser-only view of the curated data. If some projects are planning to clone the repos, this would allow them to only get the data they need. Is that needed?
Anything else?