Dataverse repoprovider and URLs
See original GitHub issueI just had a quick chat with @betatim about integrating Dataverse with Binder and while there are already some open issues and pull requests about this…
- Binderverse: integrating Binderhub with Dataverse (using docker+kubernetes): https://github.com/IQSS/dataverse/issues/4714
- Dataverse content provider: https://github.com/jupyter/repo2docker/pull/739
… the intention of this issue is to discuss the details of
- What the URL should look like on the Binder side when someone clicks “Binder” from a dataset in Dataverse.
- What the UI in Binder should look like when someone wants to operate on code and data stored in and installation of Dataverse. Dataverse supports both DOIs and Handles but starting with just DOIs is certainly fine as in the mockup below.
Over at https://github.com/IQSS/dataverse/issues/4714#issuecomment-510594346 I described what’s possible today with no changes to the Dataverse code.
One can create a binder.json file like this:
{
"displayName": "MyBinder",
"description": "Analyze in MyBinder",
"type": "explore",
"toolUrl": "https://mybinder.org/v2/dataverse/",
"contentType": "application/x-ipynb+json",
"toolParameters": {
"queryParameters": [
{
"siteUrl": "{siteUrl}"
},
{
"datasetId": "{datasetId}"
},
{
"fileId": "{fileId}"
}
]
}
}
And load up that Binder “external tool” into Dataverse like this:
curl http://localhost:8080/api/admin/externalTools -X POST --upload-file mybinder.json
Once the external tool has been loaded into the installation of Dataverse a “Binder” or “MyBinder” (or whatever) button will appear under the “Explore” drop down like this:
Users clicks “MyBinder” they will be taken to URLs like the following:
https://mybinder.org/v2/dataverse/?siteUrl=https://dev2.dataverse.org&datasetId=18&fileId=30
Based on the query parameters for siteUrl
and datasetId
, I believe the code at https://github.com/jupyter/repo2docker/pull/739 will be able to download all the files from Dataverse.
I have a test GitHub repo with a “Launch Binder” button ready to play with: https://github.com/pdurbin/dataverse-irc-metrics
A few weeks ago I gave a demo of running a Jupyter Notebook against a TSV file in this repo using @whole-tale as an external tool: https://scholar.harvard.edu/pdurbin/blog/2019/jupyter-notebooks-and-crazy-ideas-for-dataverse
The plot I created looked something like this:
The goal is to offer two ways to user Binder with Dataverse:
- By entering a Dataverse DOI in Binder, run code, such as a Jupyter Notebook.
- By clicking “Binder” from a dataset in Dataverse with code and data, run code against the data in Binder.
I am happy to spin up Dataverse test servers to assist in this effort. At the moment, you can go to https://dev2.dataverse.org/file.xhtml?fileId=30 to see a MyBinder button.
Issue Analytics
- State:
- Created 4 years ago
- Comments:20 (20 by maintainers)
Top GitHub Comments
Does this help?
https://doi.org/api/handles/10.11587/ERDG3O
I found this at “Proxy Server REST API” at https://www.doi.org/factsheets/DOIProxy.html#rest-api
@Xarthisius oh! That extra hop is our fault (Harvard Dataverse’s fault). We changed hostnames a while back and should retarget old PIDs like that. I opened and issue about this:https://github.com/IQSS/dataverse.harvard.edu/issues/40
If you want to treat some of these old PIDs as broken and not working, that’s fine. It would nudge Dataverse installations to clean house a little bit. They’d be forced to update old DOI records to point them to current hostnames.
Or you are welcome to keep following 302 redirects like you’re doing now. Whatever works best for you, really! Thanks for working on this!