Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Does the size of batch-size affect the training results?

See original GitHub issue

Hi, I have run the train.py with the command blow on KITTI-raw-data : python3 train.py /path/to/the/formatted/data/ -b4 -m0 -s2.0 --epoch-size 1000 --sequence-length 5 --log-output --with-gt Otherwise the batch-size=80, and the train(41664)/vaild(2452) split is different. The result I get is: disp: Results with scale factor determined by GT/prediction ratio (like the original paper) : ` abs_rel, sq_rel, rms, log_rms, a1, a2, a3 0.2058, 1.6333, 6.7410, 0.2895, 0.6762, 0.8853, 0.9532

pose: Results 10 ATE, RE mean 0.0223, 0.0053 std 0.0188, 0.0036

Results 09 ATE, RE mean 0.0284, 0.0055 std 0.0241, 0.0035 ` You can see that there’s still a quiet big margin with yours: Abs Rel | Sq Rel | RMSE | RMSE(log) | Acc.1 | Acc.2 | Acc.3 0.181 | 1.341 | 6.236 | 0.262 | 0.733 | 0.901 | 0.964

I think there is no other factors causing this difference, otherwise the batch-size and data split. Therefore, does the size of batch-size affect the training results?

What’s more, when I try to train my model with two Titan GPUs, batch-size=80*2=160, the memory usage of each GPU is: GPU0: about 11G, GPU1: about 6G. There is a huge memory usage difference between two GPUs, and it seriously impacts multi-gpu trianing. And then I find the loss calculations are all placed on the first GPU, actually the memory is mainly used to calculate the 4 scales of depth photometric_reconstruction_loss, and we can just move some scales to the cuda:0, and others to cuda:1, it might be better I think.

Issue Analytics

State:
Created 5 years ago
Comments:14 (7 by maintainers)

Top GitHub Comments

1reaction

ClementPinardcommented, Oct 7, 2018

Results with your split, using model_best :

Results with scale factor determined by GT/prediction ratio (like the original paper) : 
   abs_rel,     sq_rel,        rms,    log_rms,         a1,         a2,         a3
    0.1854,     1.3986,     6.4104,     0.2687,     0.7149,     0.8985,     0.9619

Results with your split, using checkpoint :

Results with scale factor determined by GT/prediction ratio (like the original paper) : 
   abs_rel,     sq_rel,        rms,    log_rms,         a1,         a2,         a3
    0.2040,     1.8203,     6.6266,     0.2914,     0.6971,     0.8848,     0.9510

As such, I think you only used the checkpoint.pth.tar . This is consistent with author’s claim that you eventually end up with worse results if you keep on training after more than 140K iterations.

0reactions

youmi-zymcommented, Oct 23, 2018

@ClementPinard Thanks very much

Top Results From Across the Web

Effect of batch size on training dynamics | by Kevin Shen

Finding: large batch size means the model makes very large gradient updates and very small gradient updates. The size of the update depends...

What is the trade-off between batch size and number of ...

IME smaller batches lead to longer training times. Often much longer because on modern hw a batch of size 32, 64 or 128...

How to Control the Stability of Training Neural Networks With ...

Larger batch sizes slow down the learning process but the final stages result in a convergence to a more stable model exemplified by...

What Is the Effect of Batch Size on Model Learning?

According to popular knowledge, increasing batch size reduces the learners' capacity to generalize. Large Batch techniques, according to the authors of the ...

Effect of Batch Size on Training Process and Results by ...

Batch size is one of the important hyperparameters to tune in modern deep learning systems. Practitioners often want to use a larger batch...