Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Model.pt and train_stats.json files not produced when running segmentation_left_atrium sample-app

See original GitHub issue

Describe the bug Model.pt and train_stats.json files are not produced when running segmentation_left_atrium sample-app

When running the segmentation_left_atrium on the server, I have tried to run ‘update model’ on Slicer3D. The trainng status bar shows that training is taking place, and once it has reached 100% I ‘submit label’ and proceed to the next sample. However, when I click, ‘update model’, ‘submit model’ or ‘Run’ (under Auto Segmentation), no new model is created in the model directory of the app . There is only a pretrained file and a folder ‘model01’ that contains events.

However, if I manually delete pretrained.pt whilst server is running before clicking ‘update model’ and ‘submit label’, then a model.pt & train_stats.json file is generated when I open the next sample.

To Reproduce Steps to reproduce the behavior:

Go to command prompt
Install sample-app segmentation_left_atrium and dataset Task02_Heart
Run commands: (in Windows command prompt, using monailabel CLI) monailabel start_server --app monailabel/sample-apps/segmentation_left_atrium --studies datasets/Task02_Heart/imagesTr
Go into Slicer3D, MONAIlabel
Load MONAIlabel server, click next sample
Update the current model & click ‘update model’
Click ‘submit label’ when training complete
Click ‘next sample’

Expected behavior model.pt and train_stats.json are expected to be produced in model directory every time I update the model in Slicer when the MONAIlabel server is running. In addition, training accuracy is not updating.

Screenshots

Environment

Ensuring you use the relevant python executable, please paste the output of:

python -c 'import monai; monai.config.print_debug_info()'

================================ Printing MONAI config…

MONAI version: 0.8.1 Numpy version: 1.22.2 Pytorch version: 1.9.0+cu111 MONAI flags: HAS_EXT = False, USE_COMPILED = False MONAI rev id: 71ff399a3ea07aef667b23653620a290364095b1

Optional dependencies: Pytorch Ignite version: 0.4.8 Nibabel version: 3.2.1 scikit-image version: 0.18.3 Pillow version: 8.4.0 Tensorboard version: 2.7.0 gdown version: 4.0.2 TorchVision version: 0.10.0+cu111 tqdm version: 4.62.3 lmdb version: 1.2.1 psutil version: 5.8.0 pandas version: NOT INSTALLED or UNKNOWN VERSION. einops version: NOT INSTALLED or UNKNOWN VERSION. transformers version: NOT INSTALLED or UNKNOWN VERSION. mlflow version: NOT INSTALLED or UNKNOWN VERSION.

For details about installing the optional dependencies, please visit: https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies

================================ Printing system config…

System: Windows Win32 version: (‘10’, ‘10.0.22000’, ‘SP0’, ‘Multiprocessor Free’) Win32 edition: Core Platform: Windows-10-10.0.22000-SP0 Processor: AMD64 Family 23 Model 24 Stepping 1, AuthenticAMD Machine: AMD64 Python version: 3.9.5 Process name: python.exe Command: [‘python’, ‘-c’, ‘import monai; monai.config.print_debug_info()’] Open files: [popenfile(path=‘C:\Windows\System32\en-US\KernelBase.dll.mui’, fd=-1), popenfile(path=‘C:\Windows\System32\en-US\kernel32.dll.mui’, fd=-1)] Num physical CPUs: 4 Num logical CPUs: 8 Num usable CPUs: 8 CPU usage (%): [54.6, 40.0, 58.3, 40.9, 50.5, 33.7, 48.6, 47.1] CPU freq. (MHz): 1700 Load avg. in last 1, 5, 15 mins (%): [0.0, 0.0, 0.0] Disk usage (%): 72.7 Avg. sensor temp. (Celsius): UNKNOWN for given OS Total physical memory (GB): 7.8 Available memory (GB): 3.3 Used memory (GB): 4.5

================================ Printing GPU config…

Num GPUs: 1 Has CUDA: True CUDA version: 11.1 cuDNN enabled: True cuDNN version: 8005 Current device: 0 Library compiled for CUDA architectures: [‘sm_37’, ‘sm_50’, ‘sm_60’, ‘sm_61’, ‘sm_70’, ‘sm_75’, ‘sm_80’, ‘sm_86’, ‘compute_37’] GPU 0 Name: NVIDIA GeForce GTX 1650 GPU 0 Is integrated: False GPU 0 Is multi GPU board: False GPU 0 Multi processor count: 16 GPU 0 Total memory (GB): 4.0 GPU 0 CUDA capability (maj.min): 7.5

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:5

Top GitHub Comments

1reaction

twilliams00commented, Feb 25, 2022

Thank you for spotting the error @diazandr3s , and for all of the suggestions. I have implemented all of the above, but it seems to be that my gpu memory allocation is at fault and is causing the issue so I will try to troubleshoot that. Once again, thanks for the help!

1reaction

twilliams00commented, Feb 24, 2022

app.log

Hi, the sample app that I used from the GitHub repository uses 50 epochs: (below is from the app log) USING:: request = {“segmentation_left_atrium”: {“name”: “model_01”, “pretrained”: true, “device”: “cuda”, “max_epochs”: 50, “val_split”: 0.2, “train_batch_size”: 1, “val_batch_size”: 1}}

Top Results From Across the Web

Getting started with Clara — Clara Train SDK v4.1 ...

This chapter provides instructions on everything you need to get started with Clara from preparing your data and training models to exporting, evaluating, ......