How to structure queue to improve training time
See original GitHub issueHi,
I am trying to parameterize torchio.queue to improve training time. Here are my augmentations:
global_augs_dict = {
'normalize':ZNormalization(),
'affine':RandomAffine(image_interpolation = 'linear'),
'elastic': RandomElasticDeformation(num_control_points=(7, 7, 7),locked_borders=2),
'motion': RandomMotion(degrees=10, translation = 10, num_transforms= 2, image_interpolation = 'linear', p = 1., seed = None),
'ghosting': RandomGhosting(num_ghosts = (4, 10), axes = (0, 1, 2), intensity = (0.5, 1), restore = 0.02, p = 1., seed = None),
'bias': RandomBiasField(coefficients = 0.5, order= 3, p= 1., seed = None),
'blur': RandomBlur(std = (0., 4.), p = 1, seed = None),
'noise':RandomNoise(mean = 0, std = (0, 0.25), p = 1., seed = None) ,
'swap':RandomSwap(patch_size = 15, num_iterations = 100, p = 1, seed = None)
}
I have tried a couple of different configurations of the queue construction (for BraTS 2020 training data):
patches_queue = torchio.Queue(
subjects_dataset,max_length = 1,
samples_per_volume = 1,
sampler = UniformSampler([128,128,128]),
num_workers=4,
shuffle_subjects=False,
shuffle_patches=True)
Here is my output:
Output
Hostname : XXXXXX
Training Data Samples: 332
CUDA_VISIBLE_DEVICES: 0
Current Device : 0
Device Count on Machine : 1
Device Name : Tesla P100-PCIE-12GB
Cuda Availibility : True
Using device: cuda
Memory Usage:
Allocated: 0.0 GB
Cached: 0.0 GB
Starting Learning rate is: 0.001
Epoch Started at: 2020-08-20 11:07:31.169938
Epoch # : 0
Learning rate: 0.001
Epoch Training dice: 0.31517219005821734
Best Training Dice: 0.31517219005821734
Average Training Loss: 0.6848278099417825
Best Training Epoch: 0
Epoch Validation dice: 0.4284125193288893
Best Validation Dice: 0.4284125193288893
Average Validation Loss: 0.5715874806711106
Best Validation Epoch: 0
Time for epoch: 730.7090194821358 mins
Epoch Started at: 2020-08-20 23:18:13.753733
Epoch # : 1
Learning rate: 0.00075025
Epoch Training dice: 0.4328200096387767
Best Training Dice: 0.4328200096387767
Average Training Loss: 0.5671799903612232
Best Training Epoch: 1
Epoch Validation dice: 0.4653519422259088
Best Validation Dice: 0.4653519422259088
Average Validation Loss: 0.5346480577740912
Best Validation Epoch: 1
Time for epoch: 727.3753080646197 mins
Epoch Started at: 2020-08-21 11:25:36.334792
Epoch # : 2
Learning rate: 0.0005005
Epoch Training dice: 0.4529779154852848
Best Training Dice: 0.4529779154852848
Average Training Loss: 0.5470220845147151
Best Training Epoch: 2
Epoch Validation dice: 0.49953905589465886
Best Validation Dice: 0.49953905589465886
Average Validation Loss: 0.5004609441053411
Best Validation Epoch: 2
Time for epoch: 738.0272558371227 mins
Epoch Started at: 2020-08-21 23:43:38.034053
Epoch # : 3
Learning rate: 0.0002507499999999999
Epoch Training dice: 0.4745635877166486
Best Training Dice: 0.4745635877166486
Average Training Loss: 0.5254364122833516
Best Training Epoch: 3
Epoch Validation dice: 0.47297200991573
Best Validation Dice: 0.49953905589465886
Average Validation Loss: 0.5270279900842699
Best Validation Epoch: 2
Time for epoch: 728.7016223390897 mins
Epoch Started at: 2020-08-22 11:52:20.179293
Epoch # : 4
Learning rate: 9.999999999999159e-07
Epoch Training dice: 0.4744484525172589
Best Training Dice: 0.4745635877166486
Average Training Loss: 0.5255515474827412
Best Training Epoch: 3
Epoch Validation dice: 0.48653238308167823
Best Validation Dice: 0.49953905589465886
Average Validation Loss: 0.5134676169183217
Best Validation Epoch: 2
Time for epoch: 724.4470616658529 mins
Epoch Started at: 2020-08-22 23:56:47.029944
Epoch # : 5
Learning rate: 0.0002507499999999999
Epoch Training dice: 0.47198090723001274
Best Training Dice: 0.4745635877166486
Average Training Loss: 0.5280190927699873
Best Training Epoch: 3
Epoch Validation dice: 0.45542112143185937
Best Validation Dice: 0.49953905589465886
Average Validation Loss: 0.5445788785681406
Best Validation Epoch: 2
Time for epoch: 723.8550246238708 mins
Epoch Started at: 2020-08-23 12:00:38.396858
Epoch # : 6
Learning rate: 0.0005005
Epoch Training dice: 0.4634351750670278
Best Training Dice: 0.4745635877166486
Average Training Loss: 0.5365648249329721
Best Training Epoch: 3
patches_queue = torchio.Queue(
subjects_dataset,max_length = 10,
samples_per_volume = 10,
sampler = UniformSampler([128,128,128]),
num_workers=4,
shuffle_subjects=False,
shuffle_patches=True)
Here is my output:
Output
Hostname : XXXXXX
Training Data Samples: 3320
CUDA_VISIBLE_DEVICES: 0
Current Device : 0
Device Count on Machine : 1
Device Name : Tesla P100-PCIE-12GB
Cuda Availibility : True
Using device: cuda
Memory Usage:
Allocated: 0.0 GB
Cached: 0.0 GB
Starting Learning rate is: 0.001
Epoch Started at: 2020-08-22 09:31:48.666395
Epoch # : 0
Learning rate: 0.001
Epoch Training dice: 0.436339787389386
Best Training Dice: 0.436339787389386
Average Training Loss: 0.5636602126106139
Best Training Epoch: 0
Epoch Validation dice: 0.4510045856401002
Best Validation Dice: 0.4510045856401002
Average Validation Loss: 0.5489954143599001
Best Validation Epoch: 0
Time for epoch: 775.5885859568914 mins
Epoch Started at: 2020-08-22 22:27:23.982092
Epoch # : 1
Learning rate: 0.00075025
Epoch Training dice: 0.47390382915650414
Best Training Dice: 0.47390382915650414
Average Training Loss: 0.5260961708434958
Best Training Epoch: 1
Epoch Validation dice: 0.46360749397146783
Best Validation Dice: 0.46360749397146783
Average Validation Loss: 0.536392506028532
Best Validation Epoch: 1
Time for epoch: 758.6359572728475 mins
Epoch Started at: 2020-08-23 11:06:02.140065
Epoch # : 2
Learning rate: 0.0005005
Epoch Training dice: 0.4902666910009399
Best Training Dice: 0.4902666910009399
Average Training Loss: 0.5097333089990602
Best Training Epoch: 2
Epoch Validation dice: 0.4494907715524214
Best Validation Dice: 0.46360749397146783
Average Validation Loss: 0.5505092284475785
Best Validation Epoch: 1
Time for epoch: 745.3841615160306 mins
Epoch Started at: 2020-08-23 23:31:25.190341
Epoch # : 3
Learning rate: 0.0002507499999999999
Epoch Training dice: 0.501831035780201
Best Training Dice: 0.501831035780201
Average Training Loss: 0.498168964219799
Best Training Epoch: 3
Is there any way to improve the training time?
Cheers, Sarthak
Issue Analytics
- State:
- Created 3 years ago
- Comments:30 (26 by maintainers)
Top Results From Across the Web
Improving a Local Learning Technique for Queue Wait Time ...
Local learning has been proposed as a common frame- work to predict both application run times and queue wait times based on workload...
Read more >Queue in Data Structure & Basic Operations for ... - Simplilearn
With this understanding of queue representation, look at the different operations that can be performed on the queues in data structures.
Read more >The Taming of the Queue: 14 Support Queue Management Tips
14 ways to manage your support queue · 1. Assign a triage team · 2. Rotate queue-control roles · 3. Identify and fix...
Read more >A time series forecasting approach for queue wait-time ...
queue's history, perform better at predicting patient wait-times ... During training the weights in the network are slightly updated in order to reduce...
Read more >Queue Data Structure - HappyCoders.eu
What Is a Queue? · Fifo Principle for Queues · Queue Operations: Enqueue and Dequeue · Applications for Queues · Time Complexity of...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
One remark: using UniformSampler makes patch generation very fast. So your code will be faster if you use less number of volumes and more patches per volume. If your volume is reasonable large compared to the patch, you can easily get e.g. 32 or 64 different patches, without being to self-similar. But of course, the optimal parameters depend on the data.
@sarthakpati, can you upgrade the library to
v0.17.36
and try again?UniformSampler
is much faster after #296.