question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItΒ collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Wrong number of 'samples_per_volume' sampled within one epoch

See original GitHub issue

Is there an existing issue for this?

  • I have searched the existing issues

Bug summary

Hi Fernando,

First of all, thank you very much for adding this functionality (https://github.com/fepegar/torchio/pull/795), and sorry that it took me so long to test it.

I think I found that, during the optimization, Queue is sampling the wrong number of samples per volume. See the example below.

Code for reproduction

import torchio as tio                                                           
import numpy as np                                                              
from torch.utils.data.dataloader import DataLoader                              
import torch, random                                                            
                                                                                
seed = 42                                                                       
torch.manual_seed(seed)                                                         
torch.cuda.manual_seed(seed)                                                       
np.random.seed(seed)                                                            
random.seed(seed)                                                               
                                                                                
subjects = []                                                                   
for sub_id in range(1, 11):                                                     
    params = {                                                                  
        "im": tio.ScalarImage(tensor=np.random.random((1, 320, 320, 10))),      
        "num_samples": 8,                                                       
        "info": str(sub_id)                                                     
    }                                                                           
    subjects.append(tio.Subject(**params))                                      
                                                                                
patch_size = (320, 320, 1)                                                      
batch_size = 10                                                                 
                                                                                
sd = tio.SubjectsDataset(subjects)                                              
                                                                                
sampler = tio.data.UniformSampler(patch_size=patch_size)                        
queue = tio.Queue(sd, max_length=50, shuffle_patches=True,                      
        samples_per_volume=-1,                                                  
        sampler=sampler, num_workers=8, shuffle_subjects=True)                  
                                                                                
tr_loader = DataLoader(queue, batch_size=batch_size, shuffle=False,                
        pin_memory=False, num_workers=0)                                           
                                                                                   
# One epoch of training                                                            
for patch in tr_loader:                                                            
    print(patch["info"])

Actual outcome

[β€˜6’, β€˜2’, β€˜7’, β€˜9’, β€˜7’, β€˜2’, β€˜4’, β€˜4’, β€˜5’, β€˜4’] [β€˜2’, β€˜7’, β€˜2’, β€˜1’, β€˜7’, β€˜5’, β€˜7’, β€˜9’, β€˜7’, β€˜1’] [β€˜5’, β€˜9’, β€˜5’, β€˜4’, β€˜4’, β€˜5’, β€˜1’, β€˜9’, β€˜7’, β€˜4’] [β€˜5’, β€˜9’, β€˜2’, β€˜1’, β€˜9’, β€˜7’, β€˜2’, β€˜1’, β€˜9’, β€˜4’] [β€˜5’, β€˜2’, β€˜1’, β€˜2’, β€˜6’, β€˜1’, β€˜4’, β€˜9’, β€˜1’, β€˜5’] [β€˜10’, β€˜10’, β€˜5’, β€˜2’, β€˜10’, β€˜10’, β€˜8’, β€˜3’, β€˜3’, β€˜3’] [β€˜8’, β€˜10’, β€˜3’, β€˜5’, β€˜2’, β€˜8’, β€˜8’, β€˜8’, β€˜8’, β€˜8’] [β€˜5’, β€˜8’, β€˜2’, β€˜3’, β€˜2’, β€˜5’, β€˜3’, β€˜3’, β€˜3’, β€˜5’]

Error messages

The output of the previous code is the ID of each subject in each of the 8 training iterations.

No error messages, but the output disagrees with my understanding of what should happen. Maybe my understanding is wrong; see below.

Expected outcome

The code above reproduces one epoch of training. Queue size is 50, training set size is 10, and I want to sample 8 patches per subject; in total, one epoch = 80 patches. Thus, the Queue will be loaded 2 times: the first time with 50 patches (first 5 lines of the output), and the second time with 30 patches (last 3 lines of the output).

I expected that each subject is sampled 8 times. However, according to the output shown above, the subjects are sampled differently. For example, subject β€œ6” is sampled only twice, and subject β€œ10” is sampled 5 times (in the second load). Some subjects are sampled the right number of times in the first load of the queue (e.g., β€œ2” and β€œ5”) but then they are resampled in the second load of the queue.

Importantly, if we change the Queue size to 80 (so, all subjects are sampled in one β€œload” of the queue), this problem disappears. I haven’t investigated this but the problem might be the way in which the number of samples per volume is tracked in the queue.

System info

Platform:   Linux-4.15.0-142-generic-x86_64-with-glibc2.17
TorchIO:    0.18.84
PyTorch:    1.8.1+cu101
SimpleITK:  2.2.0 (ITK 5.3)
NumPy:      1.23.3
Python:     3.8.9 (default, Apr  3 2021, 01:02:10) 
[GCC 5.4.0 20160609]

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
fepegarcommented, Oct 10, 2022

Fixed in #981. Thanks @jmlipman for your detailed report and code to reproduce.

0reactions
fepegarcommented, Oct 9, 2022

I think I got it. It only took me three hours πŸ˜…

Read more comments on GitHub >

github_iconTop Results From Across the Web

Epoch_length wrong number of iterations - ignite
I want to iterate over the training samples (172 samples) a fixed number of times (e.g. 512) trainer.run(train_loader, max_epochs=500,Β ...
Read more >
Epoch comprised more than `samples_per_epoch` samples ...
The warning which you mentioned is rasied in a fit_generator method. It basically checks if the number of samples processed in one epoch...
Read more >
Rethinking the Queue class to get full GPU utilization #393
This implementation will ensure each of the subjects is sampled once and only once per epoch with the given number of samples_per_volume ....
Read more >
Why do neural network researchers care about epochs?
An epoch in stochastic gradient descent is defined as a single pass through the data. For each SGD minibatch, k samples are drawn,Β ......
Read more >
Difference between batch size & epoch - Kaggle
What is batch size? The batch size is a hyperparameter that defines the number of samples to work through before updating the internal...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found