Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Wrong number of 'samples_per_volume' sampled within one epoch

See original GitHub issue

Is there an existing issue for this?

I have searched the existing issues

Bug summary

Hi Fernando,

First of all, thank you very much for adding this functionality (https://github.com/fepegar/torchio/pull/795), and sorry that it took me so long to test it.

I think I found that, during the optimization, Queue is sampling the wrong number of samples per volume. See the example below.

Code for reproduction

import torchio as tio                                                           
import numpy as np                                                              
from torch.utils.data.dataloader import DataLoader                              
import torch, random                                                            
                                                                                
seed = 42                                                                       
torch.manual_seed(seed)                                                         
torch.cuda.manual_seed(seed)                                                       
np.random.seed(seed)                                                            
random.seed(seed)                                                               
                                                                                
subjects = []                                                                   
for sub_id in range(1, 11):                                                     
    params = {                                                                  
        "im": tio.ScalarImage(tensor=np.random.random((1, 320, 320, 10))),      
        "num_samples": 8,                                                       
        "info": str(sub_id)                                                     
    }                                                                           
    subjects.append(tio.Subject(**params))                                      
                                                                                
patch_size = (320, 320, 1)                                                      
batch_size = 10                                                                 
                                                                                
sd = tio.SubjectsDataset(subjects)                                              
                                                                                
sampler = tio.data.UniformSampler(patch_size=patch_size)                        
queue = tio.Queue(sd, max_length=50, shuffle_patches=True,                      
        samples_per_volume=-1,                                                  
        sampler=sampler, num_workers=8, shuffle_subjects=True)                  
                                                                                
tr_loader = DataLoader(queue, batch_size=batch_size, shuffle=False,                
        pin_memory=False, num_workers=0)                                           
                                                                                   
# One epoch of training                                                            
for patch in tr_loader:                                                            
    print(patch["info"])

Actual outcome

[‘6’, ‘2’, ‘7’, ‘9’, ‘7’, ‘2’, ‘4’, ‘4’, ‘5’, ‘4’] [‘2’, ‘7’, ‘2’, ‘1’, ‘7’, ‘5’, ‘7’, ‘9’, ‘7’, ‘1’] [‘5’, ‘9’, ‘5’, ‘4’, ‘4’, ‘5’, ‘1’, ‘9’, ‘7’, ‘4’] [‘5’, ‘9’, ‘2’, ‘1’, ‘9’, ‘7’, ‘2’, ‘1’, ‘9’, ‘4’] [‘5’, ‘2’, ‘1’, ‘2’, ‘6’, ‘1’, ‘4’, ‘9’, ‘1’, ‘5’] [‘10’, ‘10’, ‘5’, ‘2’, ‘10’, ‘10’, ‘8’, ‘3’, ‘3’, ‘3’] [‘8’, ‘10’, ‘3’, ‘5’, ‘2’, ‘8’, ‘8’, ‘8’, ‘8’, ‘8’] [‘5’, ‘8’, ‘2’, ‘3’, ‘2’, ‘5’, ‘3’, ‘3’, ‘3’, ‘5’]

Error messages

The output of the previous code is the ID of each subject in each of the 8 training iterations.

No error messages, but the output disagrees with my understanding of what should happen. Maybe my understanding is wrong; see below.

Expected outcome

The code above reproduces one epoch of training. Queue size is 50, training set size is 10, and I want to sample 8 patches per subject; in total, one epoch = 80 patches. Thus, the Queue will be loaded 2 times: the first time with 50 patches (first 5 lines of the output), and the second time with 30 patches (last 3 lines of the output).

I expected that each subject is sampled 8 times. However, according to the output shown above, the subjects are sampled differently. For example, subject “6” is sampled only twice, and subject “10” is sampled 5 times (in the second load). Some subjects are sampled the right number of times in the first load of the queue (e.g., “2” and “5”) but then they are resampled in the second load of the queue.

Importantly, if we change the Queue size to 80 (so, all subjects are sampled in one “load” of the queue), this problem disappears. I haven’t investigated this but the problem might be the way in which the number of samples per volume is tracked in the queue.

System info

Platform:   Linux-4.15.0-142-generic-x86_64-with-glibc2.17
TorchIO:    0.18.84
PyTorch:    1.8.1+cu101
SimpleITK:  2.2.0 (ITK 5.3)
NumPy:      1.23.3
Python:     3.8.9 (default, Apr  3 2021, 01:02:10) 
[GCC 5.4.0 20160609]