Model.pt and train_stats.json files not produced when running segmentation_left_atrium sample-app
See original GitHub issueDescribe the bug Model.pt and train_stats.json files are not produced when running segmentation_left_atrium sample-app
When running the segmentation_left_atrium on the server, I have tried to run ‘update model’ on Slicer3D. The trainng status bar shows that training is taking place, and once it has reached 100% I ‘submit label’ and proceed to the next sample. However, when I click, ‘update model’, ‘submit model’ or ‘Run’ (under Auto Segmentation), no new model is created in the model directory of the app . There is only a pretrained file and a folder ‘model01’ that contains events.
However, if I manually delete pretrained.pt whilst server is running before clicking ‘update model’ and ‘submit label’, then a model.pt & train_stats.json file is generated when I open the next sample.
To Reproduce Steps to reproduce the behavior:
- Go to command prompt
- Install sample-app segmentation_left_atrium and dataset Task02_Heart
- Run commands: (in Windows command prompt, using monailabel CLI) monailabel start_server --app monailabel/sample-apps/segmentation_left_atrium --studies datasets/Task02_Heart/imagesTr
- Go into Slicer3D, MONAIlabel
- Load MONAIlabel server, click next sample
- Update the current model & click ‘update model’
- Click ‘submit label’ when training complete
- Click ‘next sample’
Expected behavior model.pt and train_stats.json are expected to be produced in model directory every time I update the model in Slicer when the MONAIlabel server is running. In addition, training accuracy is not updating.
Screenshots
Environment
Ensuring you use the relevant python executable, please paste the output of:
python -c 'import monai; monai.config.print_debug_info()'
================================ Printing MONAI config…
MONAI version: 0.8.1 Numpy version: 1.22.2 Pytorch version: 1.9.0+cu111 MONAI flags: HAS_EXT = False, USE_COMPILED = False MONAI rev id: 71ff399a3ea07aef667b23653620a290364095b1
Optional dependencies: Pytorch Ignite version: 0.4.8 Nibabel version: 3.2.1 scikit-image version: 0.18.3 Pillow version: 8.4.0 Tensorboard version: 2.7.0 gdown version: 4.0.2 TorchVision version: 0.10.0+cu111 tqdm version: 4.62.3 lmdb version: 1.2.1 psutil version: 5.8.0 pandas version: NOT INSTALLED or UNKNOWN VERSION. einops version: NOT INSTALLED or UNKNOWN VERSION. transformers version: NOT INSTALLED or UNKNOWN VERSION. mlflow version: NOT INSTALLED or UNKNOWN VERSION.
For details about installing the optional dependencies, please visit: https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies
================================ Printing system config…
System: Windows Win32 version: (‘10’, ‘10.0.22000’, ‘SP0’, ‘Multiprocessor Free’) Win32 edition: Core Platform: Windows-10-10.0.22000-SP0 Processor: AMD64 Family 23 Model 24 Stepping 1, AuthenticAMD Machine: AMD64 Python version: 3.9.5 Process name: python.exe Command: [‘python’, ‘-c’, ‘import monai; monai.config.print_debug_info()’] Open files: [popenfile(path=‘C:\Windows\System32\en-US\KernelBase.dll.mui’, fd=-1), popenfile(path=‘C:\Windows\System32\en-US\kernel32.dll.mui’, fd=-1)] Num physical CPUs: 4 Num logical CPUs: 8 Num usable CPUs: 8 CPU usage (%): [54.6, 40.0, 58.3, 40.9, 50.5, 33.7, 48.6, 47.1] CPU freq. (MHz): 1700 Load avg. in last 1, 5, 15 mins (%): [0.0, 0.0, 0.0] Disk usage (%): 72.7 Avg. sensor temp. (Celsius): UNKNOWN for given OS Total physical memory (GB): 7.8 Available memory (GB): 3.3 Used memory (GB): 4.5
================================ Printing GPU config…
Num GPUs: 1 Has CUDA: True CUDA version: 11.1 cuDNN enabled: True cuDNN version: 8005 Current device: 0 Library compiled for CUDA architectures: [‘sm_37’, ‘sm_50’, ‘sm_60’, ‘sm_61’, ‘sm_70’, ‘sm_75’, ‘sm_80’, ‘sm_86’, ‘compute_37’] GPU 0 Name: NVIDIA GeForce GTX 1650 GPU 0 Is integrated: False GPU 0 Is multi GPU board: False GPU 0 Multi processor count: 16 GPU 0 Total memory (GB): 4.0 GPU 0 CUDA capability (maj.min): 7.5
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:5
Top GitHub Comments
Thank you for spotting the error @diazandr3s , and for all of the suggestions. I have implemented all of the above, but it seems to be that my gpu memory allocation is at fault and is causing the issue so I will try to troubleshoot that. Once again, thanks for the help!
app.log
Hi, the sample app that I used from the GitHub repository uses 50 epochs: (below is from the app log) USING:: request = {“segmentation_left_atrium”: {“name”: “model_01”, “pretrained”: true, “device”: “cuda”, “max_epochs”: 50, “val_split”: 0.2, “train_batch_size”: 1, “val_batch_size”: 1}}