question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cannot load timit_asr data set

See original GitHub issue

Describe the bug

I am trying to load the timit_asr data set. I have tried with a copy from the LDC, and a copy from deepai. In both cases they fail with a “duplicate key” error. With the LDC version I have to convert the file extensions all to upper-case before I can load it at all.

Steps to reproduce the bug

timit = datasets.load_dataset("timit_asr", data_dir = "/path/to/dataset")
# Sample code to reproduce the bug

Expected results

The data set should load without error. It worked for me before the LDC url change.

Actual results

datasets.keyhash.DuplicatedKeysError: FAILURE TO GENERATE DATASET !
Found duplicate Key: SA1
Keys should be unique and deterministic in nature

Environment info

  • datasets version:
  • datasets version: 2.2.2
  • Platform: Linux-5.4.0-90-generic-x86_64-with-glibc2.17
  • Python version: 3.8.12
  • PyArrow version: 8.0.0
  • Pandas version: 1.4.2

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
bhaddowcommented, Jun 1, 2022

Ah, if I change the train/ and test/ directories to TRAIN/ and TEST/ then it works!

0reactions
albertvillanovacommented, Jun 2, 2022

Thanks for your investigation and report, @bhaddow. I’m adding another fix for the TRAIN/train and TEST/test directory names.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Timit_asr dataset issue - Hugging Face Forums
I am trying to load the timit_asr dataset however only the first record is shown (duplicated over all the raws).
Read more >
Unable to Load Built in Dataset - Stack Overflow
I'm trying to load the catsM dataset, but continue to get this error. ... library(boot) > datasets:catsM Error: object 'datasets' not found.
Read more >
huggingface save and load model - You.com | The search engine ...
When the training is completed I save the model using the following: model.save_pretrained('/path/to/save/folder') Then I load my validation dataset and ...
Read more >
Deep learning - Wikipedia
The adjective "deep" in deep learning refers to the use of multiple layers in the network. Early work showed that a linear perceptron...
Read more >
Top 119 resources for wav2vec models - NLP Hub - Metatext
Upload dataset, train and deploy machine learning models in minutes. ➡️ Test For Free Now ... The model is based on the timit_asr...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found