load_dataset('cnn_dalymail', '3.0.0') gives a 'Not a directory' error
See original GitHub issuefrom datasets import load_dataset
dataset = load_dataset('cnn_dailymail', '3.0.0')
Stack trace:
---------------------------------------------------------------------------
NotADirectoryError Traceback (most recent call last)
<ipython-input-6-2e06a8332652> in <module>()
1 from datasets import load_dataset
----> 2 dataset = load_dataset('cnn_dailymail', '3.0.0')
5 frames
/usr/local/lib/python3.6/dist-packages/datasets/load.py in load_dataset(path, name, data_dir, data_files, split, cache_dir, features, download_config, download_mode, ignore_verifications, save_infos, script_version, **config_kwargs)
608 download_config=download_config,
609 download_mode=download_mode,
--> 610 ignore_verifications=ignore_verifications,
611 )
612
/usr/local/lib/python3.6/dist-packages/datasets/builder.py in download_and_prepare(self, download_config, download_mode, ignore_verifications, try_from_hf_gcs, dl_manager, **download_and_prepare_kwargs)
513 if not downloaded_from_gcs:
514 self._download_and_prepare(
--> 515 dl_manager=dl_manager, verify_infos=verify_infos, **download_and_prepare_kwargs
516 )
517 # Sync info
/usr/local/lib/python3.6/dist-packages/datasets/builder.py in _download_and_prepare(self, dl_manager, verify_infos, **prepare_split_kwargs)
568 split_dict = SplitDict(dataset_name=self.name)
569 split_generators_kwargs = self._make_split_generators_kwargs(prepare_split_kwargs)
--> 570 split_generators = self._split_generators(dl_manager, **split_generators_kwargs)
571
572 # Checksums verification
/root/.cache/huggingface/modules/datasets_modules/datasets/cnn_dailymail/0128610a44e10f25b4af6689441c72af86205282d26399642f7db38fa7535602/cnn_dailymail.py in _split_generators(self, dl_manager)
252 def _split_generators(self, dl_manager):
253 dl_paths = dl_manager.download_and_extract(_DL_URLS)
--> 254 train_files = _subset_filenames(dl_paths, datasets.Split.TRAIN)
255 # Generate shared vocabulary
256
/root/.cache/huggingface/modules/datasets_modules/datasets/cnn_dailymail/0128610a44e10f25b4af6689441c72af86205282d26399642f7db38fa7535602/cnn_dailymail.py in _subset_filenames(dl_paths, split)
153 else:
154 logging.fatal("Unsupported split: %s", split)
--> 155 cnn = _find_files(dl_paths, "cnn", urls)
156 dm = _find_files(dl_paths, "dm", urls)
157 return cnn + dm
/root/.cache/huggingface/modules/datasets_modules/datasets/cnn_dailymail/0128610a44e10f25b4af6689441c72af86205282d26399642f7db38fa7535602/cnn_dailymail.py in _find_files(dl_paths, publisher, url_dict)
132 else:
133 logging.fatal("Unsupported publisher: %s", publisher)
--> 134 files = sorted(os.listdir(top_dir))
135
136 ret_files = []
NotADirectoryError: [Errno 20] Not a directory: '/root/.cache/huggingface/datasets/downloads/1bc05d24fa6dda2468e83a73cf6dc207226e01e3c48a507ea716dc0421da583b/cnn/stories'
I have ran the code on Google Colab
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:12 (2 by maintainers)
Top Results From Across the Web
Cnn_dailymail dataset loading problem with Colab - Beginners
Most of the time when I try to load this dataset using Colab, it throws a “Not a directory” error: NotADirectoryError: [Errno 20]...
Read more >Getting error message after trying to load dataset from seaborn
The code is done in idle. Code: import seaborn as sns planets = sns.load_dataset('planets'). Error:.
Read more >What is a "failed to create a symbolic link: file exists" error?
if ~/Documents/saga exists and is not a directory, you will have the error ... Hope this helps anyone who still faces 'file exists'...
Read more >ISPF messages starting with ISR - IBM
Invalid command - The command entered is not valid for BROWSE. ISRB011 Severe error - Unexpected return code from ISRCBR. ISRB012 Bad directory...
Read more >Load dataset from OneDrive web folder - Statalist
And r(679) is not even in this list of common error messages. ... (username/password), which Stata is not in a position to give....
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Has anyone solved this ? I still get this error
2 short-term workarounds:
dataset = load_dataset('ccdv/cnn_dailymail', '3.0.0')
. In a related issue, this person mentioned another data source copy that just works.cnn_dailymail.py
is on your computer.cnn_stories
anddm_stories
url’s by adding the following to the end of them&confirm=t
. This should be around line 67.~/.cache/huggingface/datasets/downloads
for me) so that they don’t get in the way of the new download attempts.Either method works for me. I would’ve made a PR, but not sure if they want to go with the new ccdv/cnn_dailymail source or not.