load_from_disk and save_to_disk are not compatible with each other
See original GitHub issueDescribe the bug
load_from_disk and save_to_disk are not compatible. When I use save_to_disk to save a dataset to disk it works perfectly but given the same directory load_from_disk throws an error that it can’t find state.json. looks like the load_from_disk only works on one split
Steps to reproduce the bug
from datasets import load_dataset
dataset = load_dataset("art")
dataset.save_to_disk("mydir")
d = Dataset.load_from_disk("mydir")
Expected results
It is expected that these two functions be the reverse of each other without more manipulation
Actual results
FileNotFoundError: [Errno 2] No such file or directory: ‘mydir/art/state.json’
Environment info
datasets
version: 1.6.2- Platform: Linux-5.4.0-73-generic-x86_64-with-Ubuntu-18.04-bionic
- Python version: 3.7.10
- PyTorch version (GPU?): 1.8.1+cu102 (True)
- Tensorflow version (GPU?): not installed (NA)
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:6 (5 by maintainers)
Top Results From Across the Web
Import Error AddS2L1CFeature, LoadFromDisk, SaveToDisk #7
But i have some issues when I try to import the classes AddS2L1CFeature, LoadFromDisk, SaveToDisk from eolearn.io.
Read more >Load From Disk Button doesn't work. - Twine Q&A
So this is an issue with every single Twine game I have currently in my hard drive. Which is ... (Though I haven't...
Read more >File Cache geometry node - SideFX
Once all the cache files are written out, turn on Load from disk at the top of the parameter interface (when you click...
Read more >FAQ Last updated: September 16, 2022... - A Tale of Crowns
Is there a way to open up additional save slots? Not for the save menu in-game. However, the game also has the option...
Read more >Is this a serialisation error or a data contract error
The exception message is pretty self-explanatory. By decorating members with these attributes, you tell serializer what items are serialized ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Though I see a stream of issues open by people lost between datasets and datasets dicts so maybe there is here something that could be better in terms of UX. Could be better error handling or something else smarter to even avoid said errors but maybe we should think about this. Reopening to use this issue as a discussion place but feel free to open a new open if you prefer @lhoestq @albertvillanova
We should probably improve the error message indeed.
Also note that there exists a function
load_from_disk
that can load a Dataset or a DatasetDict. Under the hood it calls eitherDataset.load_from_disk
orDatasetDict.load_from_disk
: