question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Memory error for sliding_window_inference

See original GitHub issue

Hi MONAI-Team, thanks for sharing your software development so far! I tried to adapt the 3D spleen segmentation example to my own dataset. After some adjustments I got the training to run and as a next step tried to run some inference on my test data. The test volume is a 1504x1504x561 CT volume in nifti format. To do so I used the sliding_window_inference on the CPU as the GPU immediatly runs out of memory.

test_transforms = Compose(
    [
        LoadNiftid(keys=["image", "label"]),
        AddChanneld(keys=["image", "label"]),
        Orientationd(keys=["image", "label"], axcodes="RAS"),
        CenterSpatialCropd(keys=["image", "label"], roi_size=[1000, 1000, 561]),
        ToTensord(keys=["image", "label"]),
    ]
)

test_ds = CacheDataset(data=test_files, transform=test_transforms, cache_rate=1, num_workers=1)
test_loader = DataLoader(test_ds, batch_size=1, num_workers=1)

test_device = torch.device("cpu")
model = UNet(
    dimensions=3,
    in_channels=1,
    out_channels=2,
    channels=(16, 32, 64, 128),
    strides=(2, 2, 2, 2),
    num_res_units=2,
    norm=Norm.BATCH,
).to(test_device)
loss_function = DiceLoss(to_onehot_y=True, softmax=True)
optimizer = torch.optim.Adam(model.parameters(), 1e-4)

model.load_state_dict(torch.load(os.path.join(root_dir, "best_metric_model.pth")))
model.eval()

for test_data in test_loader:
    val_inputs, val_labels = (
        test_data["image"].to(test_device),
        test_data["label"].to(test_device),
    )
    print(f"batch_data image: {test_data['image'].shape}")
    print(f"batch_data label: {test_data['label'].shape}")
    roi_size = (64, 64, 64)
    sw_batch_size = 1
    test_outputs = sliding_window_inference(val_inputs, roi_size, sw_batch_size, model)

Unfortunately, I get either an out-of-memory error, a data loader error or the kernel dies. In each case the RAM of my machine (360 GB) runs out of memory during the sliding window inference. Therefore, I believe that all three errors are caused by the pilling up of the data in the RAM.

What am I doing wrong? Did I miss interpreted something? I my understanding the ROI in the sliding window inferer should crop the sample to sub-volumes of the roi_size which then are used for inference by the network. Therefore, the memory footprint should be controlable by the roi_size and should even run on the GPU. But even if the entire sample is stored it should not require more than a couple of GBs (size of the nifti = 1.7 GB). test_data does only consist of this single sample. Thank you very much for your help!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:10 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
wylicommented, Nov 12, 2020

just tried to run your example script, the main issue is that it needs a no_grad() to avoid the gradient accumulation:

with torch.no_grad():
   test_outputs = sliding_window_inference(val_inputs, roi_size, sw_batch_size, model)

there’s no memory error but the inference is slow because the model is on CPU. I’ll submit a PR to make the device specification more flexible. thanks!

edit: with the latest codebase it’s possible to have

with torch.no_grad():
   test_outputs = sliding_window_inference(val_inputs, roi_size, sw_batch_size, model, sw_device="cuda", device="cpu")

which uses cuda to run network(window_data) and uses cpu memory to store the final predicted volume

0reactions
joho84commented, Nov 12, 2020

Perfect! That’s exactly what I was looking for! Thank you very much!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Inference methods — MONAI 1.1.0 Documentation
Sliding window inference on inputs with predictor . The outputs of predictor could be a tensor, a tuple, or a dictionary of tensors....
Read more >
MONAI Bootcamp - Day 1 - Lab 3 - YouTube
Three separate notebooks cover in-depth discussion about DataSets, Networks, Sliding Window Inference, and Post-Processing transforms, ...
Read more >
Training vs Inference - Memory Consumption by Neural ...
With Inference, the memory consumption is quite different. The neural network has optimized weights; thus, only a forward pass is necessary, ...
Read more >
Revisiting nnU-Net for Iterative Pseudo Labeling and Efficient ...
How much time does the efficient sliding window strategy reduce in the inference stage? How much performance will be dropped by using this ......
Read more >
Formulating 'Out of Memory Kill' Prediction on the Netflix App ...
The best way to do this is to use a sliding window approach where we label the memory readings of the sessions in...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found