"num_samples should be a positive integer value" error if `eval_split_size` is >= size of dataset
See original GitHub issueWe’re training a WaveGrad Vocoder on a fairly small dataset right now (~250 samples), and ran into the following error recently:
Traceback (most recent call last):
File "./TTS/bin/train_vocoder_wavegrad.py", line 442, in <module>
main(args)
File "./TTS/bin/train_vocoder_wavegrad.py", line 412, in main
_, global_step = train(model, criterion, optimizer, scheduler, scaler,
File "./TTS/bin/train_vocoder_wavegrad.py", line 82, in train
data_loader = setup_loader(ap, is_val=False, verbose=(epoch == 0))
File "./TTS/bin/train_vocoder_wavegrad.py", line 46, in setup_loader
loader = DataLoader(dataset,
File "coqui-tts\lib\site-packages\torch\utils\data\dataloader.py", line 266, in __init__
sampler = RandomSampler(dataset, generator=generator) # type: ignore
File "coqui-tts\lib\site-packages\torch\utils\data\sampler.py", line 103, in __init__
raise ValueError("num_samples should be a positive integer "
ValueError: num_samples should be a positive integer value, but got num_samples=0
This appears to be related to the undocumented eval_split_size
setting in the config.json
value. The default config for WaveGrad specifies this as 256
. After debugging for a bit, it appears that the way this setting works is that it controls how many files are used for the evaluation set. So, if there are 500 WAV files, and the eval_split_size
is set to 256
, then the first 256
audio files encountered are used for the evaluation set and the remaining 244
are used for training.
Since it can take a fair bit of debugging for an end-user to understand what’s going on, I propose two things:
- There should be a sanity check/validation check that raises a more appropriate error if the number of WAV files is smaller than the
eval_split_size
. - The
eval_split_size
parameter in the config should be documented so users understand what it does and can tune it appropriately.
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
num_samples should be a positive integer value, but got ...
ValueError: num_samples should be a positive integer value, ... the meaning of the error message is that the dataset has a size of...
Read more >ValueError: num_samples should be a positive integer value ...
The problem is that the dataset is empty. The datapath may be wrong or preprocessing might be causing problems ending up with no...
Read more >Num_samples should be a positive integer ... - PyTorch Forums
Hello everybody, I am new to PyTorch. I have a problem when I tried to train my data. When I run my program...
Read more >num_samples should be a positive integeral value, but got ...
Has anyone encountered and solved the below error: Error: ValueError: num_samples should be a positive integeral value, but got num_samples= ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I can work on this when I circle back to TTS stuff.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.