Discussion using datasets in offline mode
See original GitHub issuedatasets.load_dataset("csv", ...)
breaks if you have no connection (There is already this issue https://github.com/huggingface/datasets/issues/761 about it). It seems to be the same for metrics too.
I create this ticket to discuss a bit and gather what you have in mind or other propositions.
Here are some points to open discussion:
- if you want to prepare your code/datasets on your machine (having internet connexion) but run it on another offline machine (not having internet connexion), it won’t work as is, even if you have all files locally on this machine.
- AFAIK, you can make it work if you manually put the python files (csv.py for example) on this offline machine and change your code to
datasets.load_dataset("MY_PATH/csv.py", ...)
. But it would be much better if you could run ths same code without modification if files are available locally. - I’ve also been considering the requirement of downloading Python code and execute on your machine to use datasets. This can be an issue in a professional context. Downloading a CSV/H5 file is acceptable, downloading an executable script can open many security issues. We certainly need a mechanism to at least “freeze” the dataset code you retrieved once so that you can review it if you want and then be sure you use this one everywhere and not a version dowloaded from internet.
WDYT? (thks)
Issue Analytics
- State:
- Created 3 years ago
- Reactions:7
- Comments:8 (3 by maintainers)
Top Results From Across the Web
Can't use datasets offline, even if I have uploaded the datasets ...
I want to use sst dataset on my school server, my dataset loding code is: raw_dataset = datasets.load_dataset('glue', 'sst2') I have uploaded my...
Read more >A Dataset Perspective on Offline Reinforcement Learning - arXiv
We illustrate this by collecting datasets using different behavioral policies in Fig. 1. Multiple Offline RL algorithms (Agarwal et al., 2020; ...
Read more >Decisions from Data: How Offline Reinforcement Learning Will ...
I will discuss how recent advances in the field of offline reinforcement learning can change that in the next few years. I believe...
Read more >Offline RL: learning to make decisions directly from datasets
Join the Bugout Slack dev community to connect with fellow data scientists, ML practitioners, and engineers: ...
Read more >Discussion on Online and Offline Teaching Mode of Data ...
A Discussion on Teaching Mode of Research-based Learning in the Practical Teaching of Data Structure. Computer Knowledge and Technology.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
here is my way to load a dataset offline, but it requires an online machine
HTH.
Requiring online connection is a deal breaker in some cases unfortunately so it’d be great if offline mode is added similar to how
transformers
loads models offline fine.@mandubian’s second bullet point suggests that there’s a workaround allowing you to use your offline (custom?) dataset with
datasets
. Could you please elaborate on how that should look like?