Memory error for sliding_window_inference
See original GitHub issueHi MONAI-Team, thanks for sharing your software development so far! I tried to adapt the 3D spleen segmentation example to my own dataset. After some adjustments I got the training to run and as a next step tried to run some inference on my test data. The test volume is a 1504x1504x561 CT volume in nifti format. To do so I used the sliding_window_inference on the CPU as the GPU immediatly runs out of memory.
test_transforms = Compose(
[
LoadNiftid(keys=["image", "label"]),
AddChanneld(keys=["image", "label"]),
Orientationd(keys=["image", "label"], axcodes="RAS"),
CenterSpatialCropd(keys=["image", "label"], roi_size=[1000, 1000, 561]),
ToTensord(keys=["image", "label"]),
]
)
test_ds = CacheDataset(data=test_files, transform=test_transforms, cache_rate=1, num_workers=1)
test_loader = DataLoader(test_ds, batch_size=1, num_workers=1)
test_device = torch.device("cpu")
model = UNet(
dimensions=3,
in_channels=1,
out_channels=2,
channels=(16, 32, 64, 128),
strides=(2, 2, 2, 2),
num_res_units=2,
norm=Norm.BATCH,
).to(test_device)
loss_function = DiceLoss(to_onehot_y=True, softmax=True)
optimizer = torch.optim.Adam(model.parameters(), 1e-4)
model.load_state_dict(torch.load(os.path.join(root_dir, "best_metric_model.pth")))
model.eval()
for test_data in test_loader:
val_inputs, val_labels = (
test_data["image"].to(test_device),
test_data["label"].to(test_device),
)
print(f"batch_data image: {test_data['image'].shape}")
print(f"batch_data label: {test_data['label'].shape}")
roi_size = (64, 64, 64)
sw_batch_size = 1
test_outputs = sliding_window_inference(val_inputs, roi_size, sw_batch_size, model)
Unfortunately, I get either an out-of-memory error, a data loader error or the kernel dies. In each case the RAM of my machine (360 GB) runs out of memory during the sliding window inference. Therefore, I believe that all three errors are caused by the pilling up of the data in the RAM.
What am I doing wrong? Did I miss interpreted something? I my understanding the ROI in the sliding window inferer should crop the sample to sub-volumes of the roi_size which then are used for inference by the network. Therefore, the memory footprint should be controlable by the roi_size and should even run on the GPU. But even if the entire sample is stored it should not require more than a couple of GBs (size of the nifti = 1.7 GB). test_data does only consist of this single sample. Thank you very much for your help!
Issue Analytics
- State:
- Created 3 years ago
- Comments:10 (4 by maintainers)
Top GitHub Comments
just tried to run your example script, the main issue is that it needs a
no_grad()
to avoid the gradient accumulation:there’s no memory error but the inference is slow because the model is on CPU. I’ll submit a PR to make the device specification more flexible. thanks!
edit: with the latest codebase it’s possible to have
which uses cuda to run
network(window_data)
and uses cpu memory to store the final predicted volumePerfect! That’s exactly what I was looking for! Thank you very much!