question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Discussion using datasets in offline mode

See original GitHub issue

datasets.load_dataset("csv", ...) breaks if you have no connection (There is already this issue https://github.com/huggingface/datasets/issues/761 about it). It seems to be the same for metrics too.

I create this ticket to discuss a bit and gather what you have in mind or other propositions.

Here are some points to open discussion:

  • if you want to prepare your code/datasets on your machine (having internet connexion) but run it on another offline machine (not having internet connexion), it won’t work as is, even if you have all files locally on this machine.
  • AFAIK, you can make it work if you manually put the python files (csv.py for example) on this offline machine and change your code to datasets.load_dataset("MY_PATH/csv.py", ...). But it would be much better if you could run ths same code without modification if files are available locally.
  • I’ve also been considering the requirement of downloading Python code and execute on your machine to use datasets. This can be an issue in a professional context. Downloading a CSV/H5 file is acceptable, downloading an executable script can open many security issues. We certainly need a mechanism to at least “freeze” the dataset code you retrieved once so that you can review it if you want and then be sure you use this one everywhere and not a version dowloaded from internet.

WDYT? (thks)

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:7
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

11reactions
ZizhenWangcommented, Jan 12, 2021

here is my way to load a dataset offline, but it requires an online machine

  1. (online machine)
import datasets
data = datasets.load_dataset(...)
data.save_to_disk(/YOUR/DATASET/DIR)
  1. copy the dir from online to the offline machine
  2. (offline machine)
import datasets
data = datasets.load_from_disk(/SAVED/DATA/DIR)

HTH.

2reactions
genikicommented, Dec 27, 2020

Requiring online connection is a deal breaker in some cases unfortunately so it’d be great if offline mode is added similar to how transformers loads models offline fine.

@mandubian’s second bullet point suggests that there’s a workaround allowing you to use your offline (custom?) dataset with datasets. Could you please elaborate on how that should look like?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Can't use datasets offline, even if I have uploaded the datasets ...
I want to use sst dataset on my school server, my dataset loding code is: raw_dataset = datasets.load_dataset('glue', 'sst2') I have uploaded my...
Read more >
A Dataset Perspective on Offline Reinforcement Learning - arXiv
We illustrate this by collecting datasets using different behavioral policies in Fig. 1. Multiple Offline RL algorithms (Agarwal et al., 2020; ...
Read more >
Decisions from Data: How Offline Reinforcement Learning Will ...
I will discuss how recent advances in the field of offline reinforcement learning can change that in the next few years. I believe...
Read more >
Offline RL: learning to make decisions directly from datasets
Join the Bugout Slack dev community to connect with fellow data scientists, ML practitioners, and engineers: ...
Read more >
Discussion on Online and Offline Teaching Mode of Data ...
A Discussion on Teaching Mode of Research-based Learning in the Practical Teaching of Data Structure. Computer Knowledge and Technology.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found