Adding an optional token to the dataset fetcher code to allow optional fetching from private repositories
See original GitHub issueDescribe the new feature or enhancement
The dataset fetching code inside mne/datasets/utils.py
, mne/utils/fetching.py
are actually very general. I was hoping to leverage them without copy/pasting the code, so I can make use of upstream possible bug fixes / performance improvements (if they ever occur).
However, in some cases, I would like to unit test against private data I have stored on Github, and they require an API token with the HTTP request. Eventually, then some of that data would be made public after say a publication, but it’s then nice to build into a CI for myself for a private research project in the meantime.
Is it possible to add an optional “token” into the dataset fetcher? This would also enable MNE to leverage private repos. In addition, it would lessen the code dependency for anyone trying to implement a data fetcher without copying every single function from MNE.
Describe your proposed implementation
Add optional token=None
kwarg to the following functions:
_download
_fetch_file
_get_http
Then one can easily add optional tokens in _data_path
, depending on which dataset is being fetched. This would also enable any “mne” package, like mne-bids/connectivity/etc. to leverage private Github repo data that might get passed in via GH actions.
Describe possible alternatives
If we further refactor things, so that key
, urls
, archive_names
, folder_origs
, folder_names
, md5_hashes
are passed into _data_path
, rather then set inside _data_path
, then to create a MNE-fetcher, one simply needs to define a data_path
that then passes these to _data_path
, and they have a fully functional: mne_downstream_package.testing.data_path()
that fetches their own datasets for testing without having to rely on MNE-Python for data fetching.
Additional Information
I think this also might be helpful in further cementing MNE-Python as a platform for developing neuroscience/clinical-neuroscience applications that sometimes might need data fetchers in their CI / testing pipeline for “private data”.
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (9 by maintainers)
Top GitHub Comments
Well, this can actually then be any private URL that requires an API token to access. But yeah I use Github for now :p for small testing datasets that fall under the 1GB limit, but we can’t make public “yet”. And yeah I basically have like 4 different version of the current MNE fetcher code, but they all modify maybe like… 10 LOC, so that suggested to me that this would be a valuable refactoring in MNE.
It seems nowadays as well that MNE is more and more of a “platform”, since it “enables” analysis and testing related to MEG/EEG/iEEG and then offloads analysis and more niche stuff to other mne packages, like mne-connectivity, mne-bids, etc. Part of this enablement in my opinion is making data fetching easier for CI/unittesting.
😃 I’m not alone
Copying over here for next 2 PRs:
Ref link: https://github.com/mne-tools/mne-python/pull/9742#issuecomment-921932044