Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Crossref integration

See original GitHub issue

Citing the original issue in https://github.com/w3c/respec/issues/2568:

Crossref is a database of about 100 million papers, identified by their Digital Object Identifier (DOI). It provides an HTTP API which can be used to search for papers and retrieve their metadata, just like SpecRef does. Therefore, it should be possible to use it just like SpecRef. For instance, I would like to be able to write something like this:
… has been extensively studied [[doi:10.1007/978-3-319-93417-4_5]]. However, …
which would fetch citation metadata from Crossref and insert it in the bibliography. It would generate a citation and link to the corresponding paper at https://doi.org/10.1007/978-3-319-93417-4_5.

There seems to be consensus that SpecRef itself should act as a proxy between clients (such as ReSpec, Bikeshed) and Crossref. In other words, it should be possible to retrieve DOI metadata via the SpecRef API.

In other words it should be possible to do something like this:

https://api.specref.org/bibrefs?refs=FileAPI,rfc2119,doi:10.1007/978-3-319-93417-4_5

and get as a response:

{
   "FileAPI": { … },
   "rfc2119": { … },
   "doi:10.1007/978-3-319-93417-4_5": {
        … metadata returned by Crossref …
   }
}

Problem A: Crossref’s metadata format is different from SpecRef’s. I see a few options:

translating it to something similar to SpecRef’s current format, adding some missing fields such as the journal or conference.
returning Crossref’s metadata as-is. This would probably mean adding a field to both SpecRef and Crossref records to indicate the format. For instance, "$schema":"URI of a JSON schema describing the format of this record". Or any other syntax.

Option 1. means SpecRef and Crossref records will have common fields (such as title, authors) that consumers can rely on without changing their renderer much. But it means we will be discarding information (for instance, ORCID ids for authors cannot be represented easily) and adding a potentially complex logic in SpecRef to translate from one format to another (which will need maintaining as the formats evolve).

Option 2. basically forces clients to use different renderers for each format. But that lets clients handle Crossref references using all the information available in Crossref’s metadata, which can be useful. And it is probably easier to maintain on SpecRef’s side as the code remains independent of any format change on Crossref’s side.

Problem B: Architecturally speaking, how do we integrate this in SpecRef? Do we want to make it possible to integrate other bibliographic databases in the same way (so, coming up with a modular design)? What should it look like?

Issue Analytics

State:
Created 4 years ago
Comments:17 (8 by maintainers)

Top GitHub Comments

1reaction

tobiecommented, Nov 15, 2019

Well, the whole point of Specref is to offer as common a data format as possible for all of its references, with the explicit goal to focus on spec edition use cases (and related tools).

So what would be ideal is providing a mapping between the two so that the API users have as little to change as possible in their code.

0reactions

tabatkinscommented, Nov 25, 2019

Sidestepping the conversation about integrating this into SpecRef itself, I’ve got no problem integrating an additional data source into Bikeshed’s data, if it’s high-quality and useful.

As Tobie alludes, I’d prefer getting all the data source directly; an http-only API means you can’t build a Bikeshedded spec offline. That said, I’m not against it philosophically; I already have some HTTP-based APIs, like the “github issues” API, that you can opt into. And since CrossRef claims to have 100 million papers, that’s, uh, a lot of data for every Bikeshed user to download. (The current Bikeshed data files directory is about 50MB; this would increase it to several gigs.)