Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

reimplement with python3.6 and pytorch 1.0.1, Is this problem because of batch_size?

See original GitHub issue

Hallo tekin,

I have just reimplement this project with python3.6 and pytorch1.0.1, and because of the limited computation of the GPU(GTX 970m, memory: 3GB) on my laptop. I changed the batch_size from 32 to only 2 and trained for a whole night(epoch:700 remains).

after that I test my model, but with results below:

-----------------------------------
  tensor to cuda : 0.000402
         predict : 0.004263
get_region_boxes : 0.063616
            eval : 0.009278
           total : 0.077559
-----------------------------------
2019-04-11 13:23:42 Results of ape
2019-04-11 13:23:42    Acc using 5 px 2D Projection = 0.00%
2019-04-11 13:23:42    Acc using 10% threshold - 0.0103 vx 3D Transformation = 0.00%
2019-04-11 13:23:42    Acc using 5 cm 5 degree metric = 0.00%
2019-04-11 13:23:42    Mean 2D pixel error is 2978.765381, Mean vertex error is 2.461958, mean corner error is 409.030334
2019-04-11 13:23:42    Translation error: 2.461311 m, angle error: 143.237230 degree, pixel error:  2978.765628 pix

all the Acc are Zero!!

I have checked the code, but everything is fine. Do you think it is because of batch_size is too small?? Or still because of the version of python and pytorch.

Looking forward to your reply!

Issue Analytics

State:
Created 4 years ago
Comments:10

Top GitHub Comments

1reaction

gaobaodingcommented, May 6, 2019

@btekin Hi, after change another GPU with 4GB memory and change the batch_size from 2 to 8. The ACC is not zero any more. It is really related with batch_size. I’m still a bit confused with this issue. So do you have any idea why the batch_size cannot be so small just like 2?

0reactions

btekincommented, Sep 18, 2019

With very small batch sizes, the learning rate is also set to a higher value in the current implementation. See https://github.com/microsoft/singleshotpose/blob/master/train.py#L386 . Therefore, with small batch sizes you might want to adapt your learning rate as well for better convergence.