question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

"num_samples should be a positive integer value" error if `eval_split_size` is >= size of dataset

See original GitHub issue

We’re training a WaveGrad Vocoder on a fairly small dataset right now (~250 samples), and ran into the following error recently:

Traceback (most recent call last):
  File "./TTS/bin/train_vocoder_wavegrad.py", line 442, in <module>
    main(args)
  File "./TTS/bin/train_vocoder_wavegrad.py", line 412, in main
    _, global_step = train(model, criterion, optimizer, scheduler, scaler,
  File "./TTS/bin/train_vocoder_wavegrad.py", line 82, in train
    data_loader = setup_loader(ap, is_val=False, verbose=(epoch == 0))
  File "./TTS/bin/train_vocoder_wavegrad.py", line 46, in setup_loader
    loader = DataLoader(dataset,
  File "coqui-tts\lib\site-packages\torch\utils\data\dataloader.py", line 266, in __init__
    sampler = RandomSampler(dataset, generator=generator)  # type: ignore
  File "coqui-tts\lib\site-packages\torch\utils\data\sampler.py", line 103, in __init__
    raise ValueError("num_samples should be a positive integer "
ValueError: num_samples should be a positive integer value, but got num_samples=0

This appears to be related to the undocumented eval_split_size setting in the config.json value. The default config for WaveGrad specifies this as 256. After debugging for a bit, it appears that the way this setting works is that it controls how many files are used for the evaluation set. So, if there are 500 WAV files, and the eval_split_size is set to 256, then the first 256 audio files encountered are used for the evaluation set and the remaining 244 are used for training.

Since it can take a fair bit of debugging for an end-user to understand what’s going on, I propose two things:

  1. There should be a sanity check/validation check that raises a more appropriate error if the number of WAV files is smaller than the eval_split_size.
  2. The eval_split_size parameter in the config should be documented so users understand what it does and can tune it appropriately.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
GuyPaddockcommented, May 7, 2021

I can work on this when I circle back to TTS stuff.

0reactions
stale[bot]commented, Jun 6, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

Read more comments on GitHub >

github_iconTop Results From Across the Web

num_samples should be a positive integer value, but got ...
ValueError: num_samples should be a positive integer value, ... the meaning of the error message is that the dataset has a size of...
Read more >
ValueError: num_samples should be a positive integer value ...
The problem is that the dataset is empty. The datapath may be wrong or preprocessing might be causing problems ending up with no...
Read more >
Num_samples should be a positive integer ... - PyTorch Forums
Hello everybody, I am new to PyTorch. I have a problem when I tried to train my data. When I run my program...
Read more >
num_samples should be a positive integeral value, but got ...
Has anyone encountered and solved the below error: Error: ValueError: num_samples should be a positive integeral value, but got num_samples= ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found