question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

reimplement with python3.6 and pytorch 1.0.1, Is this problem because of batch_size?

See original GitHub issue

Hallo tekin,

I have just reimplement this project with python3.6 and pytorch1.0.1, and because of the limited computation of the GPU(GTX 970m, memory: 3GB) on my laptop. I changed the batch_size from 32 to only 2 and trained for a whole night(epoch:700 remains).

after that I test my model, but with results below:

-----------------------------------
  tensor to cuda : 0.000402
         predict : 0.004263
get_region_boxes : 0.063616
            eval : 0.009278
           total : 0.077559
-----------------------------------
2019-04-11 13:23:42 Results of ape
2019-04-11 13:23:42    Acc using 5 px 2D Projection = 0.00%
2019-04-11 13:23:42    Acc using 10% threshold - 0.0103 vx 3D Transformation = 0.00%
2019-04-11 13:23:42    Acc using 5 cm 5 degree metric = 0.00%
2019-04-11 13:23:42    Mean 2D pixel error is 2978.765381, Mean vertex error is 2.461958, mean corner error is 409.030334
2019-04-11 13:23:42    Translation error: 2.461311 m, angle error: 143.237230 degree, pixel error:  2978.765628 pix

all the Acc are Zero!!

I have checked the code, but everything is fine. Do you think it is because of batch_size is too small?? Or still because of the version of python and pytorch.

Looking forward to your reply!

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:10

github_iconTop GitHub Comments

1reaction
gaobaodingcommented, May 6, 2019

@btekin Hi, after change another GPU with 4GB memory and change the batch_size from 2 to 8. The ACC is not zero any more. It is really related with batch_size. I’m still a bit confused with this issue. So do you have any idea why the batch_size cannot be so small just like 2?

0reactions
btekincommented, Sep 18, 2019

With very small batch sizes, the learning rate is also set to a higher value in the current implementation. See https://github.com/microsoft/singleshotpose/blob/master/train.py#L386 . Therefore, with small batch sizes you might want to adapt your learning rate as well for better convergence.

Read more comments on GitHub >

github_iconTop Results From Across the Web

When running model forward with large batch size, it reports ...
When running model forward with large batch size, it reports the error: THCudaTensor sizes too large for THCDeviceTensor conversion #24401.
Read more >
Batch size problem - PyTorch Forums
Hi everyone, I created my own data set and it's size=[22806,1,3]. But when I tried to put this dataset to training. I got...
Read more >
How to Grid Search Hyperparameters for Deep Learning ...
This is an odd example because often, you will choose one approach a priori and instead focus on tuning its parameters on your...
Read more >
Cuda not compatible with PyTorch installation error while ...
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 ... and followed to implement conda install pytorch torchvision ...
Read more >
Protein ML Colab Notebooks - Tianyu Lu
A numpy array and a PyTorch tensor can store the same data, ... in /usr/local/lib/python3.6/dist-packages (from tensorboard) (1.0.1) ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found