MONAI dataloader does not exhibit random behaviour when using np.random
See original GitHub issueDescribe the bug
There is a known issue with PyTorch’s dataloader with >0 workers and using np.random
to generate random numbers, described here. This issue does not occur if you use torch.randint
to generate the random numbers.
I had though MONAI’s dataloader fixes this issue by setting a default worker_init_fn
, but it seems like this isn’t the case - the correct behaviour is observed when using torch.randint
but not with np.random
.
I understand the simple solution is to always use torch’s rng, but I suspect lots of users won’t know about this and will be caught out by this issue.
To Reproduce
Steps to reproduce the behavior:
Run this code, switching between torch/np in __getitem__
:
import numpy as np
import torch
from monai.data import Dataset, DataLoader
#from torch.utils.data import DataLoader
class RandomDataset(Dataset):
def __getitem__(self, index):
#return np.random.randint(0, 1000, (1,))
return torch.randint(0, 1000, (1,))
def __len__(self):
return 16
dataset = RandomDataset([])
dataloader = DataLoader(dataset, batch_size=2, num_workers=4)
for epoch in range(2):
for i, batch in enumerate(dataloader):
print(epoch, i, batch.data.numpy().flatten().tolist())
Expected behavior
Data should be random between batches, and between epochs. This is seen using torch
:
0 0 [613, 264]
0 1 [642, 526]
0 2 [265, 338]
0 3 [988, 574]
0 4 [138, 602]
0 5 [577, 24]
0 6 [172, 986]
0 7 [902, 680]
1 0 [610, 146]
1 1 [898, 486]
1 2 [178, 408]
1 3 [679, 366]
1 4 [302, 361]
1 5 [698, 83]
1 6 [65, 102]
1 7 [615, 643]
but not when using numpy
:
0 0 [819, 130]
0 1 [819, 130]
0 2 [819, 130]
0 3 [819, 130]
0 4 [24, 498]
0 5 [24, 498]
0 6 [24, 498]
0 7 [24, 498]
1 0 [819, 130]
1 1 [819, 130]
1 2 [819, 130]
1 3 [819, 130]
1 4 [24, 498]
1 5 [24, 498]
1 6 [24, 498]
1 7 [24, 498]
Screenshots If applicable, add screenshots to help explain your problem.
Environment
Ensuring you use the relevant python executable, please paste the output of:
python -c 'import monai; monai.config.print_debug_info()'
================================
Printing MONAI config...
================================
MONAI version: 0.5.1+1.g36d855c6.dirty
Numpy version: 1.19.5
Pytorch version: 1.8.1+cu102
MONAI flags: HAS_EXT = False, USE_COMPILED = False
MONAI rev id: 36d855c690f03a44554f7017417cf7fdb12b8477
Optional dependencies:
Pytorch Ignite version: NOT INSTALLED or UNKNOWN VERSION.
Nibabel version: 3.2.1
scikit-image version: NOT INSTALLED or UNKNOWN VERSION.
Pillow version: 8.2.0
Tensorboard version: 2.5.0
gdown version: NOT INSTALLED or UNKNOWN VERSION.
TorchVision version: 0.9.1+cu102
ITK version: NOT INSTALLED or UNKNOWN VERSION.
tqdm version: 4.60.0
lmdb version: NOT INSTALLED or UNKNOWN VERSION.
psutil version: NOT INSTALLED or UNKNOWN VERSION.
For details about installing the optional dependencies, please visit:
https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies
================================
Printing system config...
================================
`psutil` required for `print_system_info`
================================
Printing GPU config...
================================
Num GPUs: 3
Has CUDA: True
CUDA version: 10.2
cuDNN enabled: True
cuDNN version: 7605
Current device: 0
Library compiled for CUDA architectures: ['sm_37', 'sm_50', 'sm_60', 'sm_70']
GPU 0 Name: TITAN RTX
GPU 0 Is integrated: False
GPU 0 Is multi GPU board: False
GPU 0 Multi processor count: 72
GPU 0 Total memory (GB): 23.7
GPU 0 CUDA capability (maj.min): 7.5
GPU 1 Name: Quadro RTX 8000
GPU 1 Is integrated: False
GPU 1 Is multi GPU board: False
GPU 1 Multi processor count: 72
GPU 1 Total memory (GB): 47.5
GPU 1 CUDA capability (maj.min): 7.5
GPU 2 Name: GeForce GT 1030
GPU 2 Is integrated: False
GPU 2 Is multi GPU board: False
GPU 2 Multi processor count: 3
GPU 2 Total memory (GB): 2.0
GPU 2 CUDA capability (maj.min): 6.1
Additional context Already discussed with @ericspod who I believe has also discussed with @rijobro - thought it would be useful to have the discussion here.
Issue Analytics
- State:
- Created 2 years ago
- Comments:13 (12 by maintainers)
Top GitHub Comments
Definitely sounds like if we don’t change the current implementation we need to at least improve the documentation to make it clearer to future users.
Hi,
As a new MONAI user I was trying out one of your tutorials (https://github.com/Project-MONAI/tutorials/blob/master/3d_segmentation/spleen_segmentation_3d_lightning.ipynb) and stumbled upon this issue when looking for a solution on having the same “random” transformation after each epoch. I spent quite some time trying to figure out why the transformations seemingly were always the same because of these lines:
To save future MONAI starters some time and avoid any silent errors it could be useful to update the tutorial to use the MONAI dataloaders as well?
Anyhow, thanks for all the great progress on this framework! 🙌