question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to structure queue to improve training time

See original GitHub issue

Hi,

I am trying to parameterize torchio.queue to improve training time. Here are my augmentations:

global_augs_dict = {
    'normalize':ZNormalization(),
    'affine':RandomAffine(image_interpolation = 'linear'), 
    'elastic': RandomElasticDeformation(num_control_points=(7, 7, 7),locked_borders=2),
    'motion': RandomMotion(degrees=10, translation = 10, num_transforms= 2, image_interpolation = 'linear', p = 1., seed = None), 
    'ghosting': RandomGhosting(num_ghosts = (4, 10), axes = (0, 1, 2), intensity = (0.5, 1), restore = 0.02, p = 1., seed = None),
    'bias': RandomBiasField(coefficients = 0.5, order= 3, p= 1., seed = None), 
    'blur': RandomBlur(std = (0., 4.), p = 1, seed = None), 
    'noise':RandomNoise(mean = 0, std = (0, 0.25), p = 1., seed = None) , 
    'swap':RandomSwap(patch_size = 15, num_iterations = 100, p = 1, seed = None) 
}

I have tried a couple of different configurations of the queue construction (for BraTS 2020 training data):

patches_queue = torchio.Queue(
subjects_dataset,max_length = 1,
samples_per_volume  = 1,
sampler = UniformSampler([128,128,128]),
num_workers=4,
shuffle_subjects=False, 
shuffle_patches=True)

Here is my output:

Output

Hostname   : XXXXXX
Training Data Samples:  332
CUDA_VISIBLE_DEVICES:  0
Current Device :  0
Device Count on Machine :  1
Device Name :  Tesla P100-PCIE-12GB
Cuda Availibility :  True
Using device: cuda
Memory Usage:
  Allocated: 0.0 GB
  Cached:  0.0 GB
Starting Learning rate is: 0.001


Epoch Started at: 2020-08-20 11:07:31.169938
Epoch # :  0
Learning rate: 0.001
Epoch Training dice: 0.31517219005821734
Best Training Dice: 0.31517219005821734
Average Training Loss: 0.6848278099417825
Best Training Epoch:  0
Epoch Validation dice: 0.4284125193288893
Best Validation Dice: 0.4284125193288893
Average Validation Loss: 0.5715874806711106
Best Validation Epoch:  0
Time for epoch: 730.7090194821358 mins


Epoch Started at: 2020-08-20 23:18:13.753733
Epoch # :  1
Learning rate: 0.00075025
Epoch Training dice: 0.4328200096387767
Best Training Dice: 0.4328200096387767
Average Training Loss: 0.5671799903612232
Best Training Epoch:  1
Epoch Validation dice: 0.4653519422259088
Best Validation Dice: 0.4653519422259088
Average Validation Loss: 0.5346480577740912
Best Validation Epoch:  1
Time for epoch: 727.3753080646197 mins


Epoch Started at: 2020-08-21 11:25:36.334792
Epoch # :  2
Learning rate: 0.0005005
Epoch Training dice: 0.4529779154852848
Best Training Dice: 0.4529779154852848
Average Training Loss: 0.5470220845147151
Best Training Epoch:  2
Epoch Validation dice: 0.49953905589465886
Best Validation Dice: 0.49953905589465886
Average Validation Loss: 0.5004609441053411
Best Validation Epoch:  2
Time for epoch: 738.0272558371227 mins


Epoch Started at: 2020-08-21 23:43:38.034053
Epoch # :  3
Learning rate: 0.0002507499999999999
Epoch Training dice: 0.4745635877166486
Best Training Dice: 0.4745635877166486
Average Training Loss: 0.5254364122833516
Best Training Epoch:  3
Epoch Validation dice: 0.47297200991573
Best Validation Dice: 0.49953905589465886
Average Validation Loss: 0.5270279900842699
Best Validation Epoch:  2
Time for epoch: 728.7016223390897 mins


Epoch Started at: 2020-08-22 11:52:20.179293
Epoch # :  4
Learning rate: 9.999999999999159e-07
Epoch Training dice: 0.4744484525172589
Best Training Dice: 0.4745635877166486
Average Training Loss: 0.5255515474827412
Best Training Epoch:  3
Epoch Validation dice: 0.48653238308167823
Best Validation Dice: 0.49953905589465886
Average Validation Loss: 0.5134676169183217
Best Validation Epoch:  2
Time for epoch: 724.4470616658529 mins


Epoch Started at: 2020-08-22 23:56:47.029944
Epoch # :  5
Learning rate: 0.0002507499999999999
Epoch Training dice: 0.47198090723001274
Best Training Dice: 0.4745635877166486
Average Training Loss: 0.5280190927699873
Best Training Epoch:  3
Epoch Validation dice: 0.45542112143185937
Best Validation Dice: 0.49953905589465886
Average Validation Loss: 0.5445788785681406
Best Validation Epoch:  2
Time for epoch: 723.8550246238708 mins


Epoch Started at: 2020-08-23 12:00:38.396858
Epoch # :  6
Learning rate: 0.0005005
Epoch Training dice: 0.4634351750670278
Best Training Dice: 0.4745635877166486
Average Training Loss: 0.5365648249329721
Best Training Epoch:  3

patches_queue = torchio.Queue(
subjects_dataset,max_length = 10,
samples_per_volume  = 10,
sampler = UniformSampler([128,128,128]),
num_workers=4,
shuffle_subjects=False, 
shuffle_patches=True)

Here is my output:

Output

Hostname   : XXXXXX
Training Data Samples:  3320
CUDA_VISIBLE_DEVICES:  0
Current Device :  0
Device Count on Machine :  1
Device Name :  Tesla P100-PCIE-12GB
Cuda Availibility :  True
Using device: cuda
Memory Usage:
  Allocated: 0.0 GB
  Cached:  0.0 GB
Starting Learning rate is: 0.001


Epoch Started at: 2020-08-22 09:31:48.666395
Epoch # :  0
Learning rate: 0.001
Epoch Training dice: 0.436339787389386
Best Training Dice: 0.436339787389386
Average Training Loss: 0.5636602126106139
Best Training Epoch:  0
Epoch Validation dice: 0.4510045856401002
Best Validation Dice: 0.4510045856401002
Average Validation Loss: 0.5489954143599001
Best Validation Epoch:  0
Time for epoch: 775.5885859568914 mins


Epoch Started at: 2020-08-22 22:27:23.982092
Epoch # :  1
Learning rate: 0.00075025
Epoch Training dice: 0.47390382915650414
Best Training Dice: 0.47390382915650414
Average Training Loss: 0.5260961708434958
Best Training Epoch:  1
Epoch Validation dice: 0.46360749397146783
Best Validation Dice: 0.46360749397146783
Average Validation Loss: 0.536392506028532
Best Validation Epoch:  1
Time for epoch: 758.6359572728475 mins


Epoch Started at: 2020-08-23 11:06:02.140065
Epoch # :  2
Learning rate: 0.0005005
Epoch Training dice: 0.4902666910009399
Best Training Dice: 0.4902666910009399
Average Training Loss: 0.5097333089990602
Best Training Epoch:  2
Epoch Validation dice: 0.4494907715524214
Best Validation Dice: 0.46360749397146783
Average Validation Loss: 0.5505092284475785
Best Validation Epoch:  1
Time for epoch: 745.3841615160306 mins


Epoch Started at: 2020-08-23 23:31:25.190341
Epoch # :  3
Learning rate: 0.0002507499999999999
Epoch Training dice: 0.501831035780201
Best Training Dice: 0.501831035780201
Average Training Loss: 0.498168964219799
Best Training Epoch:  3

Is there any way to improve the training time?

Cheers, Sarthak

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:30 (26 by maintainers)

github_iconTop GitHub Comments

1reaction
dvolgyescommented, Sep 17, 2020

One remark: using UniformSampler makes patch generation very fast. So your code will be faster if you use less number of volumes and more patches per volume. If your volume is reasonable large compared to the patch, you can easily get e.g. 32 or 64 different patches, without being to self-similar. But of course, the optimal parameters depend on the data.

1reaction
fepegarcommented, Sep 17, 2020

@sarthakpati, can you upgrade the library to v0.17.36 and try again? UniformSampler is much faster after #296.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Improving a Local Learning Technique for Queue Wait Time ...
Local learning has been proposed as a common frame- work to predict both application run times and queue wait times based on workload...
Read more >
Queue in Data Structure & Basic Operations for ... - Simplilearn
With this understanding of queue representation, look at the different operations that can be performed on the queues in data structures.
Read more >
The Taming of the Queue: 14 Support Queue Management Tips
14 ways to manage your support queue · 1. Assign a triage team · 2. Rotate queue-control roles · 3. Identify and fix...
Read more >
A time series forecasting approach for queue wait-time ...
queue's history, perform better at predicting patient wait-times ... During training the weights in the network are slightly updated in order to reduce...
Read more >
Queue Data Structure - HappyCoders.eu
What Is a Queue? · Fifo Principle for Queues · Queue Operations: Enqueue and Dequeue · Applications for Queues · Time Complexity of...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found