question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Total time difference when training

See original GitHub issue

Hi guys, is there any time difference between training 300 images vs 150 images ? I tried training 300 images with 55 test using (ecpoh=300) which Google Colab terminate it at 267/300 ,it took me over 7 hours to reach. so i divided the dataset into 2 hoping it will reduces the time by 50% but from what i can see here, it is going by the previous time

Using resnet50 as network backbone For Mask R-CNN model
Applying Default Augmentation on Dataset
Train 300 images
Validate 55 images
Checkpoint Path: /content/mask_rcnn_models
Selecting layers to train
Epoch 1/300
100/100 [==============================] - 205s 1s/step - batch: 49.5000 - size: 4.0000 - loss: 0.9180 - rpn_class_loss: 0.0146 - rpn_bbox_loss: 0.3307 - mrcnn_class_loss: 0.0552 - mrcnn_bbox_loss: 0.3484 - mrcnn_mask_loss: 0.1690 - val_loss: 0.6101 - val_rpn_class_loss: 0.0078 - val_rpn_bbox_loss: 0.2722 - val_mrcnn_class_loss: 0.0228 - val_mrcnn_bbox_loss: 0.1798 - val_mrcnn_mask_loss: 0.1275
Epoch 2/300
100/100 [==============================] - 121s 1s/step - batch: 49.5000 - size: 4.0000 - loss: 0.4959 - rpn_class_loss: 0.0057 - rpn_bbox_loss: 0.1984 - mrcnn_class_loss: 0.0274 - mrcnn_bbox_loss: 0.1425 - mrcnn_mask_loss: 0.1218 - val_loss: 0.5547 - val_rpn_class_loss: 0.0047 - val_rpn_bbox_loss: 0.2960 - val_mrcnn_class_loss: 0.0110 - val_mrcnn_bbox_loss: 0.1219 - val_mrcnn_mask_loss: 0.1212
Epoch 3/300
100/100 [==============================] - 126s 1s/step - batch: 49.5000 - size: 4.0000 - loss: 0.4234 - rpn_class_loss: 0.0043 - rpn_bbox_loss: 0.1824 - mrcnn_class_loss: 0.0206 - mrcnn_bbox_loss: 0.1022 - mrcnn_mask_loss: 0.1140 - val_loss: 0.3582 - val_rpn_class_loss: 0.0029 - val_rpn_bbox_loss: 0.1576 - val_mrcnn_class_loss: 0.0124 - val_mrcnn_bbox_loss: 0.0807 - val_mrcnn_mask_loss: 0.1046
Epoch 4/300
100/100 [==============================] - 121s 1s/step - batch: 49.5000 - size: 4.0000 - loss: 0.3597 - rpn_class_loss: 0.0033 - rpn_bbox_loss: 0.1438 - mrcnn_class_loss: 0.0164 - mrcnn_bbox_loss: 0.0839 - mrcnn_mask_loss: 0.1123 - val_loss: 0.3611 - val_rpn_class_loss: 0.0018 - val_rpn_bbox_loss: 0.1736 - val_mrcnn_class_loss: 0.0076 - val_mrcnn_bbox_loss: 0.0670 - val_mrcnn_mask_loss: 0.1111
Epoch 5/300
100/100 [==============================] - 121s 1s/step - batch: 49.5000 - size: 4.0000 - loss: 0.3001 - rpn_class_loss: 0.0025 - rpn_bbox_loss: 0.1137 - mrcnn_class_loss: 0.0122 - mrcnn_bbox_loss: 0.0595 - mrcnn_mask_loss: 0.1123 - val_loss: 0.3264 - val_rpn_class_loss: 0.0020 - val_rpn_bbox_loss: 0.1344 - val_mrcnn_class_loss: 0.0089 - val_mrcnn_bbox_loss: 0.0771 - val_mrcnn_mask_loss: 0.1040
Epoch 6/300
100/100 [==============================] - 125s 1s/step - batch: 49.5000 - size: 4.0000 - loss: 0.2718 - rpn_class_loss: 0.0019 - rpn_bbox_loss: 0.0992 - mrcnn_class_loss: 0.0095 - mrcnn_bbox_loss: 0.0556 - mrcnn_mask_loss: 0.1055 - val_loss: 0.2959 - val_rpn_class_loss: 0.0023 - val_rpn_bbox_loss: 0.1174 - val_mrcnn_class_loss: 0.0098 - val_mrcnn_bbox_loss: 0.0614 - val_mrcnn_mask_loss: 0.1050
Epoch 7/300
100/100 [==============================] - 120s 1s/step - batch: 49.5000 - size: 4.0000 - loss: 0.2894 - rpn_class_loss: 0.0022 - rpn_bbox_loss: 0.1127 - mrcnn_class_loss: 0.0113 - mrcnn_bbox_loss: 0.0562 - mrcnn_mask_loss: 0.1071 - val_loss: 0.3831 - val_rpn_class_loss: 0.0028 - val_rpn_bbox_loss: 0.1883 - val_mrcnn_class_loss: 0.0095 - val_mrcnn_bbox_loss: 0.0698 - val_mrcnn_mask_loss: 0.1127
Epoch 8/300
Using resnet50 as network backbone For Mask R-CNN model
Applying Default Augmentation on Dataset
Train 150 images
Validate 28 images
Checkpoint Path: /content/mask_rcnn_models
Selecting layers to train
Epoch 1/300
100/100 [==============================] - 192s 1s/step - batch: 49.5000 - size: 4.0000 - loss: 0.8760 - rpn_class_loss: 0.0113 - rpn_bbox_loss: 0.2987 - mrcnn_class_loss: 0.0464 - mrcnn_bbox_loss: 0.3224 - mrcnn_mask_loss: 0.1972 - val_loss: 0.6289 - val_rpn_class_loss: 0.0057 - val_rpn_bbox_loss: 0.2860 - val_mrcnn_class_loss: 0.0325 - val_mrcnn_bbox_loss: 0.1937 - val_mrcnn_mask_loss: 0.1110
Epoch 2/300
100/100 [==============================] - 121s 1s/step - batch: 49.5000 - size: 4.0000 - loss: 0.4598 - rpn_class_loss: 0.0034 - rpn_bbox_loss: 0.1973 - mrcnn_class_loss: 0.0159 - mrcnn_bbox_loss: 0.1235 - mrcnn_mask_loss: 0.1197 - val_loss: 0.4465 - val_rpn_class_loss: 0.0044 - val_rpn_bbox_loss: 0.1912 - val_mrcnn_class_loss: 0.0217 - val_mrcnn_bbox_loss: 0.1230 - val_mrcnn_mask_loss: 0.1063
Epoch 3/300
100/100 [==============================] - 120s 1s/step - batch: 49.5000 - size: 4.0000 - loss: 0.3609 - rpn_class_loss: 0.0029 - rpn_bbox_loss: 0.1573 - mrcnn_class_loss: 0.0120 - mrcnn_bbox_loss: 0.0824 - mrcnn_mask_loss: 0.1064 - val_loss: 0.4007 - val_rpn_class_loss: 0.0034 - val_rpn_bbox_loss: 0.1635 - val_mrcnn_class_loss: 0.0149 - val_mrcnn_bbox_loss: 0.1122 - val_mrcnn_mask_loss: 0.1068
Epoch 4/300
100/100 [==============================] - 121s 1s/step - batch: 49.5000 - size: 4.0000 - loss: 0.3058 - rpn_class_loss: 0.0023 - rpn_bbox_loss: 0.1208 - mrcnn_class_loss: 0.0082 - mrcnn_bbox_loss: 0.0707 - mrcnn_mask_loss: 0.1039 - val_loss: 0.3454 - val_rpn_class_loss: 0.0027 - val_rpn_bbox_loss: 0.1386 - val_mrcnn_class_loss: 0.0125 - val_mrcnn_bbox_loss: 0.0861 - val_mrcnn_mask_loss: 0.1055
Epoch 5/300
100/100 [==============================] - 121s 1s/step - batch: 49.5000 - size: 4.0000 - loss: 0.2659 - rpn_class_loss: 0.0020 - rpn_bbox_loss: 0.1040 - mrcnn_class_loss: 0.0076 - mrcnn_bbox_loss: 0.0521 - mrcnn_mask_loss: 0.1001 - val_loss: 0.3603 - val_rpn_class_loss: 0.0025 - val_rpn_bbox_loss: 0.1673 - val_mrcnn_class_loss: 0.0144 - val_mrcnn_bbox_loss: 0.0813 - val_mrcnn_mask_loss: 0.0949
Epoch 6/300
100/100 [==============================] - 120s 1s/step - batch: 49.5000 - size: 4.0000 - loss: 0.2282 - rpn_class_loss: 0.0016 - rpn_bbox_loss: 0.0842 - mrcnn_class_loss: 0.0054 - mrcnn_bbox_loss: 0.0399 - mrcnn_mask_loss: 0.0971 - val_loss: 0.3388 - val_rpn_class_loss: 0.0021 - val_rpn_bbox_loss: 0.1219 - val_mrcnn_class_loss: 0.0145 - val_mrcnn_bbox_loss: 0.0944 - val_mrcnn_mask_loss: 0.1059
Epoch 7/300
100/100 [==============================] - 120s 1s/step - batch: 49.5000 - size: 4.0000 - loss: 0.1911 - rpn_class_loss: 0.0014 - rpn_bbox_loss: 0.0589 - mrcnn_class_loss: 0.0052 - mrcnn_bbox_loss: 0.0340 - mrcnn_mask_loss: 0.0915 - val_loss: 0.3143 - val_rpn_class_loss: 0.0018 - val_rpn_bbox_loss: 0.1305 - val_mrcnn_class_loss: 0.0069 - val_mrcnn_bbox_loss: 0.0735 - val_mrcnn_mask_loss: 0.1016
Epoch 8/300
100/100 [==============================] - 122s 1s/step - batch: 49.5000 - size: 4.0000 - loss: 0.1919 - rpn_class_loss: 0.0012 - rpn_bbox_loss: 0.0599 - mrcnn_class_loss: 0.0048 - mrcnn_bbox_loss: 0.0326 - mrcnn_mask_loss: 0.0934 - val_loss: 0.2894 - val_rpn_class_loss: 0.0019 - val_rpn_bbox_loss: 0.1249 - val_mrcnn_class_loss: 0.0065 - val_mrcnn_bbox_loss: 0.0618 - val_mrcnn_mask_loss: 0.0943

which means there is no difference between them, so i’m still tied to be using over 7 hours to train 150 images.

Is there any recommendation or i’m missing something ? I would appreciate any help.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

6reactions
ayoolaolafenwacommented, May 25, 2021

@alic-xc This is the weakness of Mask R-CNN algorithm. Training with Mask R-CNN consumes a lot of power. If you want to train faster you will have to make use of a GPU with a bigger capacity. The alternative available is to train all the heads of Mask R-CNN, I set it by default to train all the layers.

train_maskrcnn.train_model(num_epochs = 300, augmentation=True, layers = "heads"  path_trained_models = "mask_rcnn_models")

In the train_model function you set the parameter layers to heads.

Note: Training the heads of the Mask R-CNN layers may not reach lower validation losses compared to training all the layers.

2reactions
alic-xccommented, May 25, 2021

@alic-xc This is the weakness of Mask R-CNN algorithm. Training with Mask R-CNN consumes a lot of power. If you want to train faster you will have to make use of a GPU with a bigger capacity. The alternative available is to train all the heads of Mask R-CNN, I set it by default to train all the layers.

train_maskrcnn.train_model(num_epochs = 300, augmentation=True, layers = "heads"  path_trained_models = "mask_rcnn_models")

In the train_model function you set the parameter layers to heads.

Note: Training the heads of the Mask R-CNN layers may not reach lower validation losses compared to training all the layers.

Okay, i think i understand it better now. Thanks @ayoolaolafenwa @khanfarhan10

Read more comments on GitHub >

github_iconTop Results From Across the Web

How Long Does It Take To See Results From Working Out?
“Depending on the training program, a beginner can be half marathon-ready in roughly 12 to 20 weeks.” FYI: VO2 max is basically the...
Read more >
Guide to Sets, Reps, and Rest Time in Strength Training
The rest interval is the time spent resting between sets that allow the muscle to recover. The rest period between sets may range...
Read more >
Calculate the difference between two times - Microsoft Support
There are several ways to calculate the difference between two times. ... Total seconds between two times (17700). 7. =HOUR(B2-A2). The difference in...
Read more >
Calculate Time in Excel (Time Difference, Hours Worked, Add
To calculate the time difference in minutes, you need to multiply the resulting value by the total number of minutes in a day...
Read more >
Total work and training time comparison between TER and ...
total work performed was 77% greater in the TER compared with the SIT group over the 2 weeks of training. The time commitment...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found