question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

load_from_disk and save_to_disk are not compatible with each other

See original GitHub issue

Describe the bug

load_from_disk and save_to_disk are not compatible. When I use save_to_disk to save a dataset to disk it works perfectly but given the same directory load_from_disk throws an error that it can’t find state.json. looks like the load_from_disk only works on one split

Steps to reproduce the bug

from datasets import load_dataset
dataset = load_dataset("art")
dataset.save_to_disk("mydir")
d = Dataset.load_from_disk("mydir")

Expected results

It is expected that these two functions be the reverse of each other without more manipulation

Actual results

FileNotFoundError: [Errno 2] No such file or directory: ‘mydir/art/state.json’

Environment info

  • datasets version: 1.6.2
  • Platform: Linux-5.4.0-73-generic-x86_64-with-Ubuntu-18.04-bionic
  • Python version: 3.7.10
  • PyTorch version (GPU?): 1.8.1+cu102 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Using GPU in script?: <fill in>
  • Using distributed or parallel set-up in script?: <fill in>

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

3reactions
thomwolfcommented, May 31, 2021

Though I see a stream of issues open by people lost between datasets and datasets dicts so maybe there is here something that could be better in terms of UX. Could be better error handling or something else smarter to even avoid said errors but maybe we should think about this. Reopening to use this issue as a discussion place but feel free to open a new open if you prefer @lhoestq @albertvillanova

1reaction
lhoestqcommented, Jun 1, 2021

We should probably improve the error message indeed.

Also note that there exists a function load_from_disk that can load a Dataset or a DatasetDict. Under the hood it calls either Dataset.load_from_disk or DatasetDict.load_from_disk:

from datasets import load_from_disk

dataset_dict = load_from_disk("path/to/dataset/dict")
single_dataset = load_from_disk("path/to/single/dataset")
Read more comments on GitHub >

github_iconTop Results From Across the Web

Import Error AddS2L1CFeature, LoadFromDisk, SaveToDisk #7
But i have some issues when I try to import the classes AddS2L1CFeature, LoadFromDisk, SaveToDisk from eolearn.io.
Read more >
Load From Disk Button doesn't work. - Twine Q&A
So this is an issue with every single Twine game I have currently in my hard drive. Which is ... (Though I haven't...
Read more >
File Cache geometry node - SideFX
Once all the cache files are written out, turn on Load from disk at the top of the parameter interface (when you click...
Read more >
FAQ Last updated: September 16, 2022... - A Tale of Crowns
Is there a way to open up additional save slots? Not for the save menu in-game. However, the game also has the option...
Read more >
Is this a serialisation error or a data contract error
The exception message is pretty self-explanatory. By decorating members with these attributes, you tell serializer what items are serialized ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found