Cannot load timit_asr data set
See original GitHub issueDescribe the bug
I am trying to load the timit_asr data set. I have tried with a copy from the LDC, and a copy from deepai. In both cases they fail with a “duplicate key” error. With the LDC version I have to convert the file extensions all to upper-case before I can load it at all.
Steps to reproduce the bug
timit = datasets.load_dataset("timit_asr", data_dir = "/path/to/dataset")
# Sample code to reproduce the bug
Expected results
The data set should load without error. It worked for me before the LDC url change.
Actual results
datasets.keyhash.DuplicatedKeysError: FAILURE TO GENERATE DATASET !
Found duplicate Key: SA1
Keys should be unique and deterministic in nature
Environment info
datasets
version:datasets
version: 2.2.2- Platform: Linux-5.4.0-90-generic-x86_64-with-glibc2.17
- Python version: 3.8.12
- PyArrow version: 8.0.0
- Pandas version: 1.4.2
Issue Analytics
- State:
- Created a year ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
Timit_asr dataset issue - Hugging Face Forums
I am trying to load the timit_asr dataset however only the first record is shown (duplicated over all the raws).
Read more >Unable to Load Built in Dataset - Stack Overflow
I'm trying to load the catsM dataset, but continue to get this error. ... library(boot) > datasets:catsM Error: object 'datasets' not found.
Read more >huggingface save and load model - You.com | The search engine ...
When the training is completed I save the model using the following: model.save_pretrained('/path/to/save/folder') Then I load my validation dataset and ...
Read more >Deep learning - Wikipedia
The adjective "deep" in deep learning refers to the use of multiple layers in the network. Early work showed that a linear perceptron...
Read more >Top 119 resources for wav2vec models - NLP Hub - Metatext
Upload dataset, train and deploy machine learning models in minutes. ➡️ Test For Free Now ... The model is based on the timit_asr...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Ah, if I change the train/ and test/ directories to TRAIN/ and TEST/ then it works!
Thanks for your investigation and report, @bhaddow. I’m adding another fix for the TRAIN/train and TEST/test directory names.